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Chapter  1 


INTRODUCTION 


The  evaluation  of  one  or  more  specific  definite  integrals  often 
constitutes  a crucial  segment  of  a scientific  or  engineering  research 
project.  As  technical  research  becomes  more  complex,  the  integrals 
encountered  tend  to  be  correspondingly  more  complicated,  and  so  more 
resistant  to  treatment  by  standard  mathematical  methods.  In  particular, 
scientists  and  engineers  are  increasingly  being  confronted  with  the  task 
of  evaluating  multi-dimensional  integrals,  an  enterprise  which  often 
overtaxes  even  highly  refined,  computer-oriented  numerical  quadrature 
techniques , 

In  the  late  1940's  a novel  technique  for  numerically  evaluating 
integrals  was  suggested  by  E.  Fermi,  J.  von  Neumann  and  S.  Ulam.  This 
technique,  which  was  termed  "Monte  Carlo"  because  of  its  reliance  upon 
random  numbers,  did  not  win  immediate  widespread  acceptance  in  the  scienti- 
fic community.  Most  scientists  apparently  just  ignored  Monte  Carlo, 
probably  because  it  seemed  rather  foolish  to  suppose  that  one  could  gain 
useful  knowledge  about  a well-defined  and  perfectly  deterministic  inte- 
gral by  playing  some  contrived  game  of  chance.  And  of  those  workers 
who  took  the  trouble  to  inform  themselves  more  fully  on  the  subject, 
many  found  that  Monte  Carlo,  as  it  was  understood  and  applied  in  the 
1950's,  simply  was  not  as  efficient  for  their  particular  problems  as  were 


the  more  conventional  numerical  methods. 


To  be  sure,  the  lonte  Carlo  method  has  its  limitations,  and  it  is 
by  no  means  an  appropriate  tool  for  many  problems.  However,  owing  to  an 
increased  understanding  of  and  improvement  in  Monte  Carlo  techniques,  as 
well  as  the  development  of  faster  digital  computers,  the  class  of  integrals 
which  are  now  amenable  to  Monte  Carlo  is  fairly  large,  and  includes  many 
of  the  unwieldy  multi-dimensional  integrals  which  scientists  and  engineers 
often  encounter  in  their  research.  For  this  reason  the  Monte  Carlo  approach 
undoubtedly  deserves  a wider  recognition  in  the  scientific  and  engineering 
community  than  it  presently  has. 

There  exists  a fairly  extensive  body  of  literature  on  Monte  Carlo. 
Currently  the  most  comprehensive  work  is  a book  by  Hammersley  and  Handscomb 
(Ref.  1);  a less  ambitious  but  somewhat  more  readable  work  is  an  article 
by  Fluendy  (Ref.  2).  However,  in  this  wiiter’s  opinion  the  ’’standard  work” 
on  Monte  Carlo  has  yet  to  be  written.  This  is  probably  because  all  of  its 
variations  and  possibilities  have  not  yet  been  brought  into  a completely 
understood  and  totally  unified  picture  by  any  one  practioner  of  the  art. 

In  addition,  it  is  usually  easier  to  do  Monte  Carlo  in  some  specific 
instance  than  it  is  to  write  (or  read)  about  it  in  general  terms.  Un- 
fortunately though,  it  is  also  easy  to  do  Monte  Carlo  inefficiently — or 
worse  still,  incorrectly — so  a good  understanding  of  the  generalities 
is  rather  essential. 

This  writer  has  used  Monte  Carlo  as  a computational  tool  in  two  areas 
of  physics,  namely,  elementary  particle  physics  (Ref.  3)  and  classical 
kinetic  theory  (Ref.  4).  This  limited  experience  has  by  no  means  rendered 
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the  author  an  expert  in  all  facets  of  Monte  Carlo;  however,  it  has  sug- 


gested  a pedagogical  approach  to  the  subject  which  is  perhaps  more  trans- 
parent to  scientists  than  are  the  standard  presentations,  which  usually 
tend  to  be  rather  deeply  couched  in  the  technical  language  of  statistics. 

The  aim  of  this  monograph,  therefore,  is  not  to  provide  a definitive 
and  comprehensive  treatment  of  all  aspects  of  Monte  Carlo;  rather,  its 
purpose  is  to  present  the  basic  principles  of  the  conventional  Monte  Carlo 
method  for  estimating  integrals,  in  a manner  that  will  convey  an  "in- 
tuitive feeling"  for  how  and  why  the  method  works.  An  intuitive  rapport 
with  the  Monte  Carlo  approach  is  important,  because  this  enables  one  to 
more  easily  identify  which  features  of  a given  integral  will  give  rise 
to  difficulties,  as  well  as  which  features  can  be  exploited  for  a gain 
in  computing  efficiency.  More  often  than  not,  this  kind  of  insight  is 
what  spells  the  difference  between  success  and  failure  in  obtaining  a 
sufficiently  accurate  numerical  estimate  for  a given  integral. 

From  the  viewpoint  of  a scientist,  the  basic  idea  behind  Monte  Carlo 
can  probably  be  best  explained  through  a familiar  example  from  statistical 
mechanics:  Suppose  we  have  a gas  composed  of  very  many  molecules  of  mass 

m in  thermal  equilibrium  at  absolute  temperature  T,  If  f(v)  is  any  function 
of  the  molecular  speed  v (e.g,,  f(v)  could  be  the  molecular  kinetic  energy 
mv2/2,  or  the  molecular  speed  v itself),  then  the  average  <f)  of  f(v)  for 
these  gas  molecules  may  be  defined  as 

<0-|  I f(v.),  N»1  (1.1) 

i*l 
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where  vi,  V2»  ♦ . • » v^,  are  the  speeds  of  N randomly  chosen  molecules. 

Now,  we  can  evaluate  ^f)  without  actually  polling  N randomly  selected 
molecules  by  making  use  of  the  Maxwell -Boltzmann  Law,  According  to 
this  law,  the  probability  that  any  molecule  will  have  a speed  between 
v and  v+dv  is 

P(v)dv  = C'x/TT) 3/2  4irv2exp(-av2)dv  (1.2) 

where  a~m/2kT,  k being  Boltzmann's  constant.  It  follows  that  the  contri- 
bution to  the  sum  on  the  right  of  (1.1)  coming  from  molecules  with  speeds 
between  v and  v+dv  will  be  NP(v)dv*f  (v) , Summing  (integrating)  over  all 
dv-intervals  thus  gives  the  quantity  I^f (v^) , and  (1.1)  yields 

<f>  « J f(v)P(v)dv  (1.3) 

0 

The  point  here  is  that,  in  statistical  mechanics,  we  evaluate  averages 
of  the  kind  on  the  right  side  of  (1.1)  by  actually  computing  definite 
integrals  of  the  kind  on  the  right  side  of  (1.3). 

The  basic  idea  behind  Monte  Carlo  is  simply  to  turn  this  procedure 
around.  Thus,  suppose  that  for  some  unrelated  reason  we  wanted  to  evaluate 
the  integral  on  the  right  side  of  (1.3),  where  P(v)  is  given  by  (1.2) 
with  a specific  numerical  value  for  a,  and  where  f(v)  is  some  given 
function  which  is  so  complicated  that  we  are  unable  to  carry  out  the 
integration  analytically.  Now,  if  we  could  somehow  obtain  a set  of 
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numbers  vi,  V2,  ...  , that  mimic  the  speeds  of  N randomly  chosen  gas 
molecules  in  thermal  equilibrium  (with  m and  T values  appropriate  to  the 
given  value  of  a),  then  we  could  evidently  evaluate  the  integral  in 
question  simply  by  averaging  f(v)  over  these  N v^values.  This  would  in 
fact  constitute  a "Monte  Carlo  evaluation"  of  the  integral : In  Monte 

Carlo,  we  evaluate  definite  integrals  of  the  kind  on  the  right  side  of 
(1.3)  by  actually  computing  averages  of  the  kind  on  the  right  side  of 
(1.1). 

From  the  foregoing  rough  description  of  the  Monte  Carlo  approach, 
several  questions  naturally  arise.  The  first  and  most  obvious  is,  can 
we  obtain  the  required  v^-values  without  actually  measuring  the  speeds  of 
randomly  selected  gas  molecules?  More  generally,  can  we  obtain  v^values 
appropriate  to  P functions  different  from  the  one  given  in  (1.2)?  These 
matters  will  be  addressed  in  Chapter  2,  where  we  discuss  in  some  detail 
how  sets  of  random  points  are  specified  and  how  they  can  be  constructed. 

The  second  question  concerns  the  accuracy  of  a Monte  Carlo  calculation. 
If  the  values  v*,  V2*  ...  , v^  are  the  speeds  of  N randomly  chosen  gas 
molecules,  then  we  surely  cannot  expect  (1.1)  to  yield  a unique  result. 

In  fact,  (1.1)  - (1.3)  are  strictly  valid  only  in  the  limit  N -*■  «.  We 
therefore  need  to  know  what  sort  of  uncertainty  in  our  result  will  be 
occasioned  by  calculating  (1.1)  with  N finite.  This  question  will  be 
discussed  in  detail  in  Chapter  3,  where  we  shall  develop  the  Monte  Carlo 
method  for  estimating  definite  integrals  in  a much  more  careful  way  than 
we  did  above. 
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Finally,  there  is  the  following  question:  Once  the  uncertainty  in 

a given  Monte  Carlo  calculation  has  been  determined,  is  there  any  way 
of  modifying  the  procedure  so  as  to  reduce  this  uncertainty?  One  intu- 
itively obvious  way  of  doing  this  would  be  to  simply  increase  N,  but 
clearly  the  time  availaole  for  computation  will  impose  an  effective  upper 
limit  on  the  size  of  N,  It  turns  out  that,  depending  on  the  specifics 
of  the  integral  in  question,  one  usually  can  find  ways  of  reducing  the  un- 
certainty without  significantly  increasing  the  computing  time.  In  Chapter  4 
we  shall  describe  some  of  these  so-called  "variance  reducing"  techniques. 

We  mentioned  that  this  report  will  concentrate  on  the  "conventional" 
Monte  Carlo  method  of  evaluating  integrals.  As  implied,  there  is  a some- 
what unconventional  Monte  Carlo  method;  this  alternate  approach  makes  use 
of  the  mathematical  concept  of  a "Markov  chain"  or  a "Markovian  random 
walk",  and  has  met  with  considerable  success  in  certain  areas  of  statistical 
mechanics.  It  is  not  our  purpose  in  this  report  to  discuss  in  detail  the 
Markov  chain  Monte  Carlo  method  for  calculating  multi-dimensional  integrals; 
however,  in  order  to  give  the  reader  some  idea  of  what  is  involved,  as  well 
as  some  guidance  to  the  literature,  we  have  included  a brief  appendix  on 
this  subject  (Appendix  I)  at  the  end  of  this  report* 

We  shall  try  in  this  report  to  avoid  as  much  as  possible  the  technical 
jargon  of  statistics,  but  we  shall  nevertheless  attempt  to  maintain  a 
reasonable  level  of  precision  and  rigor.  We  assume  at  the  outset  that  the 
reader  is  acquainted  with  che  common  (albeit  not  universal)  view  of 
"probability"  as  the  ratio  of  the  number  of  trials  with  a favorable  outcome 


6 


wf***!'*'**** * - -ertt*** >’ 


«.-i  - vs&RmtiWEmBs# • av***  '«-.■»•■ 


to  the  total  number  of  trials,  taken  in  the  limit  of  infinitely  many 
trials.  From  this  notion  one  easily  deduces  the  addition  and  multipli- 
cation laws  for  probabilities: 

! 'Addition  Law:  If  pi  and  p2  are  the  probabilities  for  the  occurrence 

f 

| of  two  mutually  exclusive  events  1 and  2,  then  the  probability  for  the 

* 

! occurrence  of  either  1 or  2 in  any  one  trial  is  P1+P2. 


'Multiplication  Law:  If  pi  is  the  probability  for  the  occurrence  of 

event  1,  and  p2i  the  probability  for  the  occurrence  of  event  2 when 
event  1 occurred  on  the  previous  trial,  then  the  probability  for  the 
occurrence  of  first  1 and  then  2 in  any  two  successive  trials  is 
Pi *p2 l . 


These  and  other  primitive  notions  about  probabilities  will  be  invoked 
frequently  throughout  our  discussion  of  the  Monte  Carlo  procedure. 


7 


Chapter  2 


SETS  OF  RANDOM  POINTS 


2-1 . Specifying  Sets  of  Random  Numbers 

All  Monte  Carlo  applications  involve  the  use  of  at  least  one  set 
of  random  numbers  {x^}  distributed  according  to  some  predetermined 
probability  density  function  P(x).  By  these  terms  we  mean  an  inexhaust- 
ible set  of  real  numbers  from  which  we  may  "draw"  sequential  elements 
xi,  X2,  X},  . such  that 

P(x)dx  = probability  that  any  xi  will  lie  between 

x and  x+dx.  (2.1) 

The  numbers  in  {x^}  are  considered  "random”  because  each  draw  can  produce 
any  real  number  x,  provided  P(x)^Ot  and  it  is  not  possible  to  say  before- 
hand what  the  drawn  number  will  be.  However,  to  say  that  the  numbers 
in  {x^}  are  "random”  is  not  to  say  that  they  are  "unbiased”.  Indeed,  the 
numbers  are  quite  definitely  biased  in  the  sense  that,  in  the  limit  of 
infinitely  many  draws,  a normalized  frequency  histogram  of  the  x^'s 
will  coincide  with  the  curve  P(x)-versus-x. 

A set  of  random  numbers  {x^}  is  specified  as  completely  as  is  possible 
by  its  probability  density  function  P(x).  However,  it  is  often  convenient 
to  work  with  its  probability  distribution  function  F(x),  which  is  defined 
in  terms  of  P(x)  by 

x 

F(x)  i f P(x')dx'  (2.2) 


8 


In  light  of  (2.1),  (2.2)  says  that  F(x)  is  the  "sum"  of  the  probabilities 
for  to  fall  inside  each  infinitesimal  interval  between  -*>  and  x; 
by  the  addition  law  of  probabilities,  F(x)  may  thus  be  interpreted  as  the 
probability  that  x^  will  be  less  than  x . 

Since  x^  will  surely  be  less  than  »,  we  have  the  following  normaliza- 
tion property: 

J P(x')dx'  = F(®)  » 1 (2.3) 

-<» 

Another  property  of  P(x),  which  derives  directly  from  its  definition 
(2.1),  is  that  it  never  be  negative: 

P(x)  £ 0 for  all  x (2.4) 

It  follows  from  (2. 2) -(2. 4)  that  the  distribution  function  F(x)  rises 
from  the  value  0 at  x*-<»  to  the  value  1 at  x=-H»  in  a non -decreasing  way. 
Indeed,  any  non-negative,  single-valued  function  of  x which  bounds 
a unit  area  with  the  x-axis  can  serve  as  a probability  density  function, 
defining  a set  of  random  numbers.  Similarly,  any  differentiable  function 
of  x which  rises  from  the  value  0 at  x*-00  to  the  value  1 at  x**H»  with- 
out ever  decreasing  can  serve  as  a probability  distribution  function, 
defining  a set  of  random  numbers. 

The  distinction  between  a probability  density  function  and  a proba- 
bility distribution  function  is  quite  Important  in  Monte  Carlo  work.t 

_ 

The  function  P(v)  in  (1.2),  or  its  closely  related  Cartesian  counterpart, 
is  usually  referred  to  as  the  "Maxwell -Boltzmann  distribution  function"; 
this  is  a rather  unfortunate  designation  since  it  is  obviously  a probability 
density  function  and  not  a probability  distribution  function. 
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F(x)  is  the  integral  curve  of  F(x),  and  conversely  P(x)  is  the  deriva- 
tive curve  of  F(x).  A plot  of  P(x)  and  F(x)  for  a hypothetical  set  of 
random  numbers  {x^}  is  shown  in  Fig.  1,  where  we  have  tried  to  illustrate 
the  properties  and  relationships  developed  above.  If  P(x)  is  zero  below 
x=a  and  above  x*b,  then  F(x)  is  zero  below  x»a  and  unity  above  x®b.  The 
total  area  under  the  P(x)  curve  is  unity,  while  the  area  under  the  P(x) 
curve  between  x and  x+dx  is  numerically  equal  to  the  probability  that  a 
number  drawn  from  {x^}  will  lie  between  x and  x+dx.  The  ordinate  at  x 
on  the  F(x)  plot  is  the  probability  that  a number  drawn  from  lx^}  will 
be  less  than  x.  Regions  on  the  x-axis  of  high  likelihood  are  distinguished 
by  high  P(x) -values  and  steeply  rising  F(x) -values;  regions  of  low  like- 
lihood are  distinguished  by  low  P(x) -values  and  nearly  constant  F(x) -values. 
Since  F(x)  is  a probability,  it  is  always  a pure  number  between  0 and  1. 

P(x)  is  not  a probability;  however,  P(x)dx  is,  so  P(x)  always  has  dimensions 
of  1/x. 

F(x)  is  sometimes  referred  to  as  the  "cumulative  distribution  function". 

We  shall  hereafter  refer  to  P(x)  and  F(x)  more  simply  as  the  "density 
function"  and  "distribution  function"  respectively. 

Suppose  a given  set  of  random  numbers  {x^}  with  density  function 
Pi(x)  is  transformed  into  a new  set  of  random  numbers  { y^ } by  applying  to 
each  element  of  {x^}  the  transformation 

y = f(x)  (2.5a) 

What  will  be  the  density  function  P2<y)  of  the  new  set  ly^?  If,  as  is  indicated 
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FIGURE  1.  Illustrating  the  relationship  between  the 
density  function  P(x)  and  the  distribution 
function  F(x)  for  a hypothetical  set 
of  random  numbers  {x^h 
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FIGURE  2»  Transforming  a set  of  random  numbers 
{ } into  a set  of  random  numbers 
{y^}  through  a function  y-f(x). 
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in  Fig.  2.  the  interval  (y,y+dy)  is  the  image  of  the  interval  (x,x+dx) 
under  (2.5a),  so  that 

dy  - |jj^|dx  * |f'(x)|dx  (2.5b) 

then  clearly  the  probability  for  finding  y^  inside  (y,y+dy)  is  the  same 
as  the  probability  for  finding  x ^ inside  (x,x-hix): 

P2(y)dy  * Pi(x)dx  (2.6) 

Inserting  (2.5b)  we  therefore  conclude  that 

p2(y)  = Pi(x)/|f'(x)|  (2.7) 

where  x on  the  right  side  of  (2.7)  is  now  to  be  regarded  as  a function  of 
y through  the  inverse  of  (2.5a):  x*f*"1(v).  The  important  result  (2.7) 
shows  that  the  density  of  random  points  y^  around  y=f  (x)  will  be  g?;eater 
than,  equal  to,  or  less  than  the  density  of  random  points  around  x 
accordingly  as  the  local  slope  jdy/dx|  of  the  transformation  curve  is  less 
than,  equal  to,  or  greater  than  unity;  these  features  can  be  appreciated 
geometrically  from  Fig.  2.  If  the  inverse  function  x^f^fy)  is  multivalued, 
so  that  a given  dy-interval  is  populated  from  several  dx  intervals,  then 
the  right  sides  of  (2.6)  and  (2.7)  will  evidently  have  to  be  summed  over 
all  contributing  intervals. 
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2-2.  The  Set  {r} 


Of  special  importance  in  Monte  Carlo  work  is  the  set  of  random 
numbers  distributed  uniformly  over  the  unit  Interval , the  elements 
of  which  we  shall  always  denote  by  r^.  More  precisely,  the  set  { r ^ } is 
defined  by  the  density  function 


0, 

for 

r<0 

1, 

for 

(KrSl 

(2.8a) 

0, 

for 

r>l 

or  the  corresponding  distribution  function  [cf.  (2,2)] 


0,  for  r<0 

F(r)  = r,  for  Q$r£l  (2,8b) 

1,  for  r>l 


Thus,  the  set  {r^,}  is  distinguished  by  the  facts  that;  (i)  the  probability 
for  a randomly  drawn  r^  to  lie  in  any  dr-interval  between  0 and  1 is 
equal  to  dr;  and  (ii)  the  probability  for  a randomly  drawn  r^  to  be  less 
than  a given  number  r between  0 and  1 is  equal  to  r. 

The  set  {r^}  is  important  in  Monte  Carlo  work  for  two  reasons;  First, 
there  exist  many  short  computer  subroutines  which  are  capable  of  rapidly 
generating  elements  of  this  set  (or  more  precisely,  elements  of  some 
set  which  simulates  {r^}  closely  enough  for  most  practical  purposes); 
and  second,  it  is  possible  to  construct  from  the  elements  of  the  set 
{ r ^ } the  elements  of  a set  fx^}  distributed  according  to  any  prescribed 
density  function  P(x).  In  this  report  we  shall  not  delve  into  the  first 
point  in  any  detail.  The  reason  for  this  omission  is  that  the  writing 


? 


of  computer  codes  to  generate  mock  elements  of  the  set  fr^} — so-called 
’’uniform  random  number  generators" — is  a complicated,  fast-changing  art 
which  is  best  entrusted  to  experts  in  statistics  and  the  theory  of 
numbers.  A nice  introduction  to  this  subject  containing  many  references 
to  the  literature  is  the  short  article  by  Chambers  (Ref.  5);  more  de- 
tailed treatments  may  be  found  in  Chapter  3 of  Hammer sley  and  Handscomb 
(Ref,  1)  and  Vol.  2 of  Knuth  (Ref.  6),  We  shall  content  ourselves  here 
with  giving  only  a brief  glimpse  of  the  general  ideas  involved  in  genera- 
ting uniformly  distributed  '’pseudorandom”  numbers  on  a digital  computer. 
Most  uniform  random  number  generators  currently  in  use  are  based 
upon  the  so-called  "multiplicative  congruential  method”.  In  its  simplest 
form,  this  method  takes  a starting  integer  and  generates  a sequence  of 
integers  N],  N2,  ...»  by  means  of  Che  recursion  relation 

Ni  £ CNi-l  (modul°  M) 

where  C and  M are  predetermined  (and  usually  very  large)  integers.  This 
relation  means  that  is  set  equal  to  the  rema inder  obtained  when  CN^ 
is  divided  by  M.  Obviously,  each  will  lie  between  0 and  M,  so  the 
elements 


I 


will  lie  between  0 and  1.  It  turns  out  that,  provided  sufficient  care 
is  taken  In  choosing  the  numbers  NqI  C and  M,  the  set  of  numbers  {r^} 
obtained  from  the  above  algorithm  approximates  a uniform  distribution 
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of  random  numbers  in  the  unit  interval  surprisingly  well.  What  un- 
desirable correlations  the  method  has  (and  it  certainly  does  have  some) 
can  be  greatly  diminished  by  incorporating  a few  twists  and  turns  into 
the  above  procedure.  However,  almost  all  uniform  random  number  generating 
subroutines  generally  available  for  digital  computers  have  in  common 
with  the  procedure  just  described  the  feature  that  each  element  r^ 
is  calculated  in  an  operationally  simple  way  from  the  result  of  the 
rk-l  ca^cu^ati°n*  anc*  also  the  feature  that  ri  is  determined  by  a 
starter  number  whose  value  can  be  changed  at  will  by  the  user  to 
generate  different,  independent  ’’chains"  of  random  numbers.  Usually  it 
is  most  economical  to  set  up  a uniform  random  number  generating  sub- 
routine so  that,  after  an  "initializing  call"  which  sets  some  value  for 
the  starter  number  Nq»  the  subroutine  will  calculate  and  output  one  ran- 
dom number  (the  next  number  of  the  chain)  each  time  it  is  called  by 
the  main  Monte  Carlo  program. 

The  author's  recent  Monte  Carlo  work  has  made  use  of  a short 
Fortran  subroutine  designed  especially  for  the  Univac  1108  computer 
by  Marsaglia  and  Bray  (Ref.  7);  their  method  essentially  tries  to  over- 
come some  of  the  correlations  present  in  congruential  generators  by 
mixing  several  such  generators  together.  We  refer  the  reader  to  their 
article  and  to  the  previously  mentioned  works  (Rets.  5,  1 and  6)  for 
further  details  on  the  computer-generation  of  pseudorandom  numbers  from 
a uniform  distribution  in  the  unit  interval.  In  the  sequel  we  shall 
simply  assume  that  we  have  easy  computer  access  to  a set  of  numbers 


\ 
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which  effectively  mimics  the  set  {r^}*,  in  practice,  this  is  usually 
the  case. 

2-3 ♦ The  Inversion  and  Rejection  Generating  Methods 

We  turn  now  to  the  important  problem  of  how  to  construct,  from  a 
given  set  of  random  numbers  {r^}  distributed  uniformly  on  the  unit 
interval,  another  set  distributed  according  to  any  prescribed  den- 

sity function  P(x).  There  are  two  primary  methods  for  accomplishing 
this,  which  we  shall  refer  to  as  the  inversion  method  and  the  rejection 
method ♦ We  consider  first  the 

Inversion  Method : Determine  the  distribution  function  F(x)  corres- 

ponding to  the  given  density  function  P(x)  fcf.  (2.2)].  Then,  for 
each  element  r.  from  the  given  set  (r^l,  choose  x^  by  solving,  the 
equation  F(x^)*r.;  1,e,t  construct  the  elements  of  the  set  {x^}  from 
the  elements  of  the  set  fr^}  according  to  the  formula 

x.  = F-1(r  ) (2.9) 

where  F"1  is  the  inverse  of  the  distribution  function. 

That  the  set  ix.}  constructed  according  to  the  foregoing  procedure 
actually  has  P(x)  as  its  density  function  follows  from  the  transformation 
theorem  proved  at  the  end  of  Sec.  2-1  (cf.  (2 . 5) -(2 . 7 ) ] . Thus,  if  the 
set  if.  with  density  function  Pi (r)  is  transformed  into  a new  set  (x^l 
by  the  transformation  x*F~1(r),  then  by  (2.7)  the  density  function 
P; (x)  of  the  new  set  is 
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P2(x)  » Pi(r)/|F~r  (r)  I 

But  the  density  function  of  the  set  {r,}  is  just  Pi(r)=l  ((Kr<l); 
furthermore,  since  (dx/dr)*l/ (dr/dx) , then  (F**1 ) f =1/ (F * ) . Thus,  the 
density  function  of  the  constructed  set  { x^ } is 

P2(x)  = 1/ [ 1 /F T (x) ] - F * (x)  = P(x) 

where  the  concluding  equality  follows  from  the  definition  (2.2). 

To  get  some  physical  insight  into  the  way  the  inversion  method 
actually  works,  consider  the  hypothetical  plot  of  r=F(x)-versus-x , shown 
in  Fig.  3.  Essentially,  the  inversion  method  lays  out  the  elements  of 
the  given  set  { r along  the  r-axis,  and  then  projects  each  r^-e  lenient 
onto  the  x-axis  through  the  curve  r=F(x).  The  projection  is  always 
well-defined  since  F(x)  rises  from  0 at  x*-«  to  1 at  x**00  in  a non- 
decreasing way.  If  An  and  Ar?  are  two  equal-size  intervals  in  0<r<l, 
then  they  will  each  contain  the  same  number  of  elements  of  the  set  { r ^ , 

at  least  to  within  random  statistical  fluctuations,  since  the  numbers  in 
(r1)  are  uniformly  distributed  over  the  unit  interval.  By  construction, 
then,  the  respective  image  intervals  Axj  and  Ax 2 will  also  contain  the 
same  number  of  elements  of  the  set  {x^},  again  to  within  random  sta- 
tistical fluctuations.  Now  if,  as  is  the  case  in  Fig.  3,  the  slope 
of  the  curve  F(x)-versus-x  is  greater  in  Ax?  than  in  Axi,  then  Ax? 
will  be  proportionately  smaller  than  Axi,  implying  that  Ax*  will  have 
a proportionately  greater  density  of  points  than  Axi.  But  the  local 
slope  of  the  curve  F(x) -versus-x  is  just  the  local  value  of  P(x) , as  is 
seen  from  the  definition  (2.2).  Thus,  we  see  that  the  density  of 
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FIGURE  3.  Illustrating  the  principle  of 

the  inversion  generating  method. 
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x^-points  produced  in  a given  region  by  the  inversion  generating  method 
is  proportional  to  the  value  of  the  function  P(x)  in  that  region,  which 
is  just  as  it  should  be, 

A simple  but  often  used  application  of  the  inversion  method  is  the 
generation  of  a set  of  random  numbers  (x^,}  distributed  uniformly  over  the 
interval  a£x£b.  The  density  function  here  is  evidently 


P(x)  = 


1/ (b-a) , 
0 , 


a£x£b 

otherwise 


(2,10a) 


Using  (2,2)  we  find  that  the  distribution  function  in  the  interval 
a<x£b  is 


F (x)  = (x-a)/(b-a)  (2.10b) 

The  inversion  of  F(x)  here  is  easily  accomplished,  and  the  construction 
rule  (2.9)  takes  the  entirely  plausible  form 

xi  = a + ri(b~a)  (2.10c) 

As  a second  general  procedure  for  generating  random  numbers 
{x^}  according  to  a prescribed  density  function  P(x),  we  consider  the 

Rejection  Method : For  this  method  it  is  required  that  the  given 

density  function  P(x)  vanish  everywhere  outside  some  finite  interval 
a^x^b,  and  be  bounded  by  some  finite  number  B inside  that  interval. 
Furthermore,  in  addition  to  the  set  of  random  numbers  (r.)  dis~ 

l 

tributed  uniformly  over  the  unit  interval,  we  shall  also  need  an 
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independent  set  of  random  numbers  {x*}  distributed  uniformly  over  the 
interval  a£x£b  [see  (2.10)].  The  generating  procedure  is  then  as 
follows.  Draw  a pair  of  random  numbers  (x^,  r^)  from  the  given  sets, 
and  take  to  be  a member  of  the  set  {x^}  if 

P(xp/B»rt  (2.11) 

If  (2.11)  is  not  satisfied,  reject  the  pair  (x*,  r^)  and  keep  drawing 
new  pairs  until  the  inequality  is  satisfied. 

The  proof  that  the  set  of  xj-values  which  pass  the  "acceptance  cri- 
terion" (2.11)  i ~ indeed  distributed  according  to  the  density  function 
P(x;  is  somewhat  more  complicated  than  the  proot  for  the  inversion  method, 
and  is  presented  in  Appendix  A.  We  merely  point  out  here  that  the  ac- 
ceptance criterion  (2.11)  is  evidently  statistically  favorable  to  x!-values 
for  which  P(x)  is  relatively  large,  and  is  statistically  unfavorable  to 
x^_Values  for  which  P(x)  is  relatively  small.  We  should  also  note  that, 
because  i ratio  is  taken  in  (2.11),  P(x)  and  its  upper  bound  B need  be 
known  only  up  to  an  overall  constant  factor;  i.e.,  the  "normalization 
constant"  need  not  be  known  when  using  the  rejection  method.  In  any  case, 
one  finds  that  the  efficiency  of  this  generating  process,  or  the 
probable  fraction  of  the  x^-values  which  will  be  accepted  as  x^-values , 
is  given  by  [see  Appendix  A] 
b 

/ P(x)dx 

E - hnzzx  (2-12) 
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Geometrically,  E can  be  interpreted  as  the  ratio  of  the  area  under  the 
curve  P(x)  to  the  area  under  the  rectangle  of  height  B and  width  (b-a) 
which  encloses  the  P(x)  curve.  Clearly,  then,  it  is  desirable  to  choose 
for  B the  smallest  upper  bound  on  P(x) , and  for  (a,b)  the  smallest 
interval  outside  of  which  P(x)  vanishes  identically. 

In  our  derivation  of  the  rejection  method  in  Appendix  A,  it  will 
be  seen  that  this  method  can  actually  be  formulated  in  a slightly  more 
general  way:  If  the  initial  set  of  random  numbers  {x*}  is  distributed 

over  a<x<b  according  to  some  density  function  F(x)  [not  necessarily  the 
uniform  function  in  (2.10a)],  then  the  density  function  of  the  set  {x^} 
which  is  constructed  according  to  the  selection  process  (2.11)  will  be 
CP(x)P(x),  C being  the  appropriate  normalizing  constant.  This  way  of 
generating  according  to  a product  density  function  is  usually  less  ef- 
ficient than  an  nall~at-onceH  approach,  but  in  some  situations  It  may 
prove  to  be  more  convenient. 

To  compare  in  a few  words  the  inversion  and  rejection  methods  for 
generating  random  numbers  {x^}  according  to  a prescribed  density  function 
P(x),  we  may  say  that  the  inversion  method  constructs  the  set  {x^}  by 
distorting  a uniformly  distributed  set  through  the  distribution  function, 
while  the  rejection  method  constructs  the  set  {x^}  by  making  selections 
from  a uniformly  distributed  set  randomly  biased  according  to  the 
density  function.  In  any  given  situation,  speed  and  convenience  will 
usually  select  one  method  over  the  other.  The  inversion  method  is 
lOOS  efficient  in  its  use  of  random  numbers,  but  it  requires  calcu- 
lating and  Inverting  the  distribution  function,  a task  which  is  sometimes 


22 


quite  difficult.  The  rejection  method  does  not  require  a knowledge  of 
the  distribution  function  nor  even  the  absolutely  normalized  density 
function,  but  it  does  require  us  to  know  a reasonable  upper  bound  on 
the  density  function;  moreover,  if  the  shape  of  the  curve  P (x) -versus-x 
is  such  that  the  area  of  the  smallest  box  enclosing  this  curve  is  very 
much  larger  than  the  area  under  this  curve,  then  the  rejection  method 
will  be  very  inefficient. 

One  other  method  for  generating  a set  of  random  numbers  will  be 
described  in  Sec,  2-8,  after  we  have  examined  the  problem  of  generating 
random  points  in  more  than  one  dimension. 

2,4 « Specifying  Sets  of  Random  Points 

We  shall  now  see  how  the  foregoing  ideas  concerning  the  specifica- 
tion and  construction  of  sets  of  random  points  in  one  dimension  can  be 
generalized  to  any  number  of  dimensions.  For  concreteness,  and  with  no 
real  loss  of  generality,  we  shall  confine  our  discussion  mainly  to  the 
three-dimensional  case;  here  we  denote  a general  point  by  x*(x,y,z)  where 
x,  y and  z are  ordinary  real  variables.  When  we  speak  of  a set  of 
random  points  (x^,y^,z^)}  distributed  according  to  the  probability 

density  function  P(3t)iP(x,y  ,z) , we  mean  an  inexhaustible  set  of  triplets 
of  real  numbers  from  which  we  may  "draw"  sequential  elements  (xi ,yi ,z\ ) , 
(x2,y2,Z2),  ...»  such  that 
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P(2)dff  - P(xty ,z)dxdydz 


^ probability  that  will  lie  between  x and 
x+dx,  and  will  lie  between  y and  y+dy,  and 
z^  will  lie  between  z and  z+dz,  (2.13) 

A set  of  random  points  is  completely  characterized  by 

its  density  function  P(x,y,z).  However,  it  is  often  convenient  to  intro- 
duce a number  of  "lesser11  density  functions  which  characterize  only 
certain  particular  features  of  the  distribution.  For  example,  we  may 
define  the  contracted  density  functions  P(x,y)  and  P(x)  by 

P(x,y)dxdy  = probability  that  x^  will  lie  between  x and 
x+dx,  and  y^  will  lie  between  y and  y+dy, 
regardless  of  where  z^  lies.  (2,14a) 

and 


P(x)dx  = probability  that  x^  will  lie  between  x and  x+dx, 

regardless  of  where  y^  and  z^  lie,  (2,14b) 

In  a similar  way  we  may  also  define  the  contracted  density  functions 
P(y,z),  P(x,z),  P(y)  and  P(z),  Of  course,  the  functional  forms  of  these 
contracted  density  functions  will  in  general  all  be  different;  e.g,, 

P(y,z)  is  generally  not  the  same  function  of  y and  z as  P(x,y)  is  of 
x and  y,  and  P(z)  is  generally  not  the  same  function  of  z as  P(x)  is 
of  x.  Nevertheless,  we  shall  avoid  a cumbersome  subscripting  of  these 
P-functions,  and  trust  that  our  meaning  will  always  be  clear  from  context. 
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It  is  easy  to  obtain  expressions  for  the  contracted  density  functions 
in  terms  of  the  full  density  function  P(x,y,z),  simply  by  invoking  the 
addition  theorem  for  probabilities.  Thus,  the  probability  in  (2.14a)  is 
obtained  simply  by  summing  (integrating)  the  probability  in  (2.13)  over 
all  dz-intervals,  and  the  probability  in  (2.14b)  is  obtained  by  further 
summing  over  all  dy-intervals: 

P(x,y)  = Jdz’P(x,y,z')  (2.15a) 

P(x)  - Jdy ' Jdz *P(x ,y ’ ,z 1 ) (2.15b) 

-00  -CO 

Of  course,  if  we  sum  (2.13)  over  all  xyz -space,  we  should  get  unity 
(certainty),  just  as  in  (2.3): 

Jdx'/dy' Jdz'P(x ' ,y ' ,z ')  - 1 (2.16) 

-00  - 00  - 00 

In  addition  to  the  contracted  density  functions  defined  in  (2.14), 
we  will  also  make  use  of  various  conditional  density  functions,  which 
are  defined  as  follows: 

P(y,z|x)dydz  ~ probability  that  y^  will  lie 

between  y and  y-hiy,  and  z^  will 

lie  between  z and  z+dz,  given 

that  xi=x  . (2.17a) 

P(yjx)dy  i probability  that  y will  lie  between 

y and  y+dy,  given  that  x^sx,  regardless  of 

where  z ^ lies.  (2.17b) 
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tMmm 


(2.17c) 


P(z|x,y)dz  = probability  that  z ^ will  lie  between  z and 
z+dz,  given  tliat  and  yi*y. 

We  read  P(y,z|x)  as  "P  of  y and  z conditioned  on  x",  P(y|x)  as  **P  of 
y conditioned  on  xM,  and  P(zjx,y)  as  "P  of  z conditioned  on  x and  y".  We 
may  obviously  introduce  six  more  conditional  density  functions  with 
different  arrangements  of  the  variables  with  respect  to  the  vertical 
slash  — e.g.,  P(x,z|y),  P(x|y),  etc.  However,  it  should  be  clearly 
understood  that  all  these  conditional  density  functions  are  generally 
different  functional  forms- — e.g.,  P(x|y)  is  not  the  same  function  of 
x and  y as  P(x|z)  is  of  x and  z,  etc. 

As  with  the  contracted  density  functions  in  (2.14),  the  conditional 
density  functions  in  (2.17)  are  completely  determined  by  the  form  of 
the  full  density  function  P(x,y,z).  We  may  derive  the  expressions  for 
the  conditional  density  functions  in  (2.17)  as  follows:  Applying  the 

multiplication  theorem  for  probabilities  to  the  probabilities  defined 
in  (2.14b)  and  (2.17a),  we  see  that 

P(x)dx*P(y,z |x)dydz  * P(x,y ,z)dxdydz 


Therefore, 


P(y ,z |x)  = P(x,y,z)/P(x) 


or,  with  (2.15b), 


P(y,z|x)  =■  P(x,y,z)/|ily'Jdz,P(x,y'  ,z') 

/ -CD  -00 


(2.18a) 


Now  treating  x as  a fixed  parameter,  the  addition  theorem  for  probabilities 
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yields  the  following  relation  between  the  probabilities  defined  in 
(2.17a)  and  (2.17b): 

00 

P(y|x)dy  * jidzfdyP(y tz ' |x) 

Inserting  (2.18a)  yields  for  P(y|x)  the  formula 

P(y|x)  * Jdz  *P(x,y ,z 1 ) / Jdy '/dz fP(x,y’z *)  (2.18b) 

-do  /-»-<» 

Finally,  again  treating  x as  fixed  and  applying  the  multiplication 
theorem  to  the  probabilities  defined  in  (2.17b)  and  (2.17c),  we  see  that 

P(y |x)dyP(z |x,y)dz  * P(y,z|x)dydz 


Therefore, 

P(z|x,y)  - P(y,z|x)/P(y|x) 


or,  inserting  (2.18a)  and  (2.18b), 


P(z jx,y)  » PU.y.z^Jdz’PCx.y.z') 


(2.18c) 


It  will  be  observed  from  the  explicit  formulae  for  the  two-dimensional 
density  functions  P(x,y)  in  (2.15a)  and  P(y,zjx)  in  (2.18a)  that  each 
is  correctly  normalized: 


no  oo  qo  oo 

Jdx’J^dy’P(x'  ,y’)  * |dy^|dz 'P(y ’ ,z  1 |x)  * 1 (2. 

Similarly,  it  will  be  observed  from  the  explicit  formulae  for  the  one 
dimensional  density  functions  P(x)  in  (2.15b),  P(y|x)  in  (2.18b)  and 
P(z jx,y)  in  (2.18c)  that  they  are  also  correctly  normalized: 


(2.20) 


GO  CO  00 

jfdx'P(x')  * Jdy,P(y’|x)  * £dz 'P(z ' |x, y)  * 1 

The  formulae  for  these  one -dimensional  density  functions  will  also  be 
observed  to  imply  the  following  important  relation  [cf.  (2,15b),  (2.18b) 
and  (2.18c)]: 

P(x,y,z)  » P(x)P(y|x)P(z|x,y)  (2.21a) 

The  physical  meaning  of  this  equation  is  best  seen  by  writing  it  in  the 
form 


P(x,y,z)dxdydz  * P(x)dx*P(y|x)dy *P(z |x,y)dz  (2.21b) 

which  says  that  the  probability  for  simultaneously  finding  x^,  y^  and 
z^  in  the  respective  intervals  (x,x+dx) , (y,y+dy)  and  (z,z+dz)  is  equal 
to  the  product  of;  (i)  the  probability  for  finding  in  (x,x+dx), 
times  (ii)  the  probability  for  finding  y in  (y,y-+dy)  given  that  x^x, 
times  (iii)  the  probability  for  finding  z^  in  (z,z+dz)  given  that  x^x 
and  y^y.  In  other  words,  (2.21b)  is  really  a consequence  of  the  multi- 
plication theorem  for  probabilities.  We  shall  refer  to  the  act  of 
expressing  the  full  three-variable  density  function  P(x,y,z)  as  a product 
of  three  one-variable  density  functions  as  "conditioning  P(x,y,z)". 

The  fact  that  we  have  derived  explicit  formulae  for  the  three  one- 
dimensional density  functions  in  (2.21),  namely 
00  00 

P (x)  =■  fdy ' Jdz  'P(x,y ' ,z ')  (2.22a) 

-oo 

00  /oo  oo 

P(y|x)  :S-ldz,p(Xiy,z  / idy,j[d2’P(x»yf  *zf)  (2.22b) 
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P(z|x,y)  = P(x,y,z) 


dz  T(x,y,z*) 


(2.22c) 


proves  that  in  principle  it  is  always  possible  condition  P(x,y,z). 
Indeed,  it  is  always  possible  to  carry  out  the  conditioning  with  respect 
to  any  ordering  of  the  variables;  e.g.,  we  could  condition  P(x,y,z)  as 
P(y)P(x ly)P(z |x, v)  or  as  p(y)P(z|y)P(x|y|Z), etc. 

Since  P(x),  P(y|x)  and  P(z|x,y)  in  (2.22)  are  one -dimensional 
density  functions,  then  we  can  introduce  in  analogy  with  (2.2)  their 
ass  or  ti*.d  distribution  functions  F(x),  F(y|x)  and  F(z|x,y): 


F(x)  i ?P(x')dx' 

-CD 

(2.23a) 

y 

F(y  |X)  3 jfP(y*  | x)dy 1 

(2.23b) 

z 

F(z|x,y)  3 jfp(z' |x,y)dz* 

(2.23c) 

Thus,  for  example,  F(yjx)  is  the  probability  that  v will  be  less  than 
given  that  x^x,  regardless  of  the  vilue  or 

We  mav  also  define  a three-  triable  distribution  function  F(x,y,z)  by 
x y 7. 

F(x,y,z)  /dx* fdy1 jdz 'P(x ' .yf,z ')  (2.24) 

Evidently,  F(x,y,z)  is  the  probability  that  a randomly  selected  element 
will  simultaneous! > have  x.j«x,  y Xy  and  zXz.  In  Monte  Carlo 
applications,  though,  distribution  functions  will)  more  than  one  argument 
are  not  of  much  use,  for  va  rec.'iLl  that  F(x)  in  (2.2)  I ; chiefly  of 
interest  because  of  the  role  which  F “1  plays  in  tne  inversion  generating 
method.  However,  F(x,y,z)  in  (2,24)  is  evidently  a mapping  from  3-space 


■ - 


into  1-space,  and  hence  does  not  have  an  inverse.  The  one-variable 
distribution  functions  in  (2.23),  on  the  other  hand,  do  nave  unique  inverses. 


and  they  will  play  an  important  role  in  the  generalization  of  the  in- 


version generating  method,  as  will  be  seen  in  the  next  section. 

Suppose  a given  set  of  random  points  {(x^,y^,z^)}  with  density 
function  Pi(x,y,z)  is  transformed  into  a new  set  of  random  points 
{<Ul,v  » by  the  transformation 

u * U(x,y,z) 
v * V (x,y ,z) 
w = W(x,y,z) 


What  will  be  the  density  function  P2(u,v,w)  of  the  set  {(u^v^w^)}?  If 
dudvdw  is  the  image  of  the  volume  element  dxdydz  under  the  transformation 
(2.25),  then  clearly  the  probability  for  finding  a point  (u£»v^>wj^  inside 
dudvdw  is  the  same  as  the  probability  for  finding  a point  (x^,y^,z^)  inside 
dxdydz.  Hence,  in  analogy  with  (2.6),  we  have 


P2 (u,v,w)dudvdw  * Pi (x,y,z)dxdydz 


(2.26) 


The  mathematical  statement  of  the  fact  that  the  volume  element  dudvdw 
centered  at  (u,v,w)  is  the  image  of  the  volume  element  dxdydz  at  (x,y,z) 
is  simply  Eq.  (2.25)  together  with  [cf.  (2,5b)] 

dudvdw  * j~iu dxdydz  (2.27a) 

o %x  ,y , z / 
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3(u,v,w) 

3(x,y,z) 


(2.27b) 


Here, 


3u  3v  3w 
3x  3x  3x 
3u  3v  3w 
3y  3y  3y 
3u  3v  3w 
3z  3z  3z 


is  the  Jacobian  of  the  transformation  (2.25),  in  which  it  is  understood 
that  the  partial  derivatives  are  all  evaluated  via  (2.25)  at  the  point 
(x,y,z)  under  consideration.  [Readers  who  are  not  altogether  familiar  with 
Jacobians  and  their  significance  may  find  the  short,  heuristic  discussion 


given  in  Appendix  B helpful.]  With  (2.27a),  (2.26)  implies  that  the 
density  function  of  the  transformed  set  {(u^v^w^)}  is 


P2(u,v,v) 


Pi(x,y,z), 


3 (u,v,w) 
3 (x,y,z) 


(2.28a) 


or  equivalently  [cf.  (B.  8)] 


P 2 (o  , V , w) 


Pi(x,y,z) 


3 (x,y ,z) 
3 (u,v ,w) 


(2.28b) 


where  x,y  and  z are  now  to  be  regarded  as  functions  of  u,v  and  w through 
the  inverse  of  (2.25).  If  the  transformation  (2.25)  is  not  strictly  one- 
to-one,  so  that  a given  dudvdw  element  is  populated  by  several  dxdydz 
elements,  then  the  right  sides  of  (2.26) and  (2.28)  will  have  to  be  summed 
over  all  contributing  dxdydz  elements. 


2-5.  The  Generalized  Inversion  Met hcd 

Let  us  now  see  how  the  inversion  method  for  generating  random  numbers 
x^^  according  to  a given  density  function  P(x)  can  be  generalized  to 
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I 


generate  random  triplets  (xi»yi»z|)  according  to  a given  density  function 
P(x,y,z).  As  already  mentioned,  the  noninvertibility  of  the  distribution 
function  F(x,ytz)  precludes  its  use  in  a formula  of  the  type  (2.9).  Instead, 
we  proceed  as  follows: 

Generalized  Inversion  Method:  First,  condition  the  given 

density  function  P(x»y,z)  in  the  form  P(x)P(y |x)P(z |x,y) 

[cf.  (2,21)  and  (2.22)],  and  calculate  the  corresponding  one- 
dimensional distribution  functions  F(x),  F(y|x)  and  F(z|x,y) 

[cf.  (2.23)].  Then,  with  r^,  and  three  independent 
random  numbers  drawn  from  the  set  {r^},  first  obtain  x^ 
by  solving  (inverting) 

r]1-F(x1)  (2.29a) 

then  obtain  y^  by  solving  (inverting) 

r2i  ' (2.29b) 

where  x^  is  the  value  found  in  (2.29a),  and  fin  , obtain 
z^  by  solving  (inverting) 

r3i  = (2.29c) 

where  x^  and  y^  are  the  values  found  in  (2,29a)  and  (2.29b), 
respectively. 

That  the  set  ((x^.y^.z^)}  constructed  according  to  the  foregoing 
procedure  actually  has  P(x,y,z)  as  its  density  function  can  be  proved 
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as  follows:  First  note  that,  in  picking  three  random  numbers  r^,  ^ 

and  r^  from  the  set  of  random  numbers  {r^}  distiibuted  uniformly 
in  the  unit  interval,  we  are  essentially  picking  one  random  point 


(rli ,r2i’r3i)  ^rom  t*ie  set  random  Points  distributed  uniformly  over 
the  unit  cube  in  r^^r^-space.  That  is,  since  the  probability  for 
simultaneously  finding  r^  in  (r^ ^-hir^) , r^  in  (r^tr^+dr^),  and 
r3i  in  (r^,r^4dr^)  is  just 


P(r1)dr1*P(r2)dr2*P(r3>dr3 


where  P(r)  is  given  by  (2.8a),  then  the  probability  density  function 
Pi (ri * r 2 ^3)  for  the  set  of  random  triplets  { Cr1± *r2i ’r3i>)  *s 


Pl^rl,r2,r35  = p^1)P(r2)P(r3) 


1,  if  0*r.a,  j-1,2,3 
0,  otherwise 


Now,  regarding  (2.29),  or  the  inverse  thereof,  as  a transformation  which 
carries  each  (r3i,r21 ,r3 . )-point  into  a (xj “point , it  follows 
from  (2.28b)  that  the  density  function  P2(x»y,z)  of  the  set  { 


IiT2 
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Now,  from  (2.29a),  n is  independent  of  y and  z,  and  from  (2.29b)., 
r2  is  independent  of  z.  Hence,  all  elements  of  the  Jacobian  determinant 
below  the  main  diagonal  vanish,  so 


P2(x,y,z) 


9ri , 9rj , Brj 

dx  3y  dz 


g-/(x)-fyF(y|x).|-zF(z|x,y)j 


P(x)-P(y|x)-P(z|x,y) 


* P(x,y,z) 


[by  (2.29)] 
[by  (2.23)] 


[by  (2.21)] 


which  establishes  the  desired  result. 

From  the  point  of  view  of  the  foregoing  proof,  the  generalized  in- 
version formulae  in  (2,29)  produce  the  desired  results  because  these 
formulae  constitute  a transformation  from  rir2r3-space  to  xyz-space 
which  has  the  rather  unique  property  that 


^ (r i ,r; >r3) _ 

3(x,y,z) 


P(x,y,z) 


(2.30) 


From  a less  formal  point  of  view,  however,  it  is  clear  that  the  three- 
dimensional  inversion  method  i_  nothing  more  than  three  successive 
applications  of  the  one-dimensional  inversion  method  to  the  conditioned 
form  of  the  density  function.  That  is,  (2.29a)  generates  a random  number 
x^  according  to  P(x),  (2,29b)  generates  a random  number  y^  according  to 
P(y|xi),  and  (2.29c)  generates  a random  number  z^  according  to  P(i|x^tyj). 
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Thus,  once  one  appreciates  the  significance  of  the  conditioned  form 
of  the  density  function  in  (2.21),  the  generalized  inversion  method 
presented  above  is  intuitively  quite  plausible. 

It  should  be  noted  that  one  has  considerable  flexibility  in  applying 
the  generalized  inversion  method.  Thus,  if  one  or  more  of  the  distri- 
bution functions  F(x),  F(y|x)  and  F(z|x,y)  are  intractible,  one  can  try 
to  condition  P(x,y,z)  in  another  form,  say,  P(y) *P(z |y) *P(x|y,z) , and 
thus  work  with  the  different  distribution  functions  F(y),  F(z|y)  and 
F(x|y,z).  Alternatively,  it  will  be  observed  that  any  or  all  of  the 
three  successive  steps  in  (2.29c),  (2.29b)  and  (2.29c)  cc.ld  actually  be 
carried  out  by  applying  the  one-dimensional  rejection  method.  For  example, 
once  x^  has  been  picked  according  to  (2.29a),  one  could  replace  (2.29b) 
by  an  application  of  the  one-dimensional  rejection  method  to  generate  a 
random  point  y ^ according  to  the  density  function  P(y|xi),  and  then 
proceed  as  usual  with  (2.29c).  However,  we  shall  regard  such  applica- 
tions of  the  one-dimensional  rejection  method  as  still  falling  under  the 
scope  of  the  "generalized  inversion  method",  and  reserve  the  term 
"generalized  rejection  method"  for  a procedure  to  be  described  later. 

2-6 . Generating  Uniformly  Distributed  Random  Points 

A very  important  application  of  the  generalized  inversion  method  is 
the  generating  of  random  points  from  a uniform  distribution  inside  some 
given  region  Q.  Suppose  for  now  that  Q can  be  specified  in  the 
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following  way:+ 


ft  = {(x,y,z)  |j  aiSx^bi,  a2  (x)^y*b2  (x)  , a3 (x,y)£z£b3 (x,y)}  (2.31a) 


The  volume  |ft|  of  the  region  ft  is  thus  given  by  the  integral 
bi  b2(xf) 

|ft|  = / dx'  / dy’fbaCxSy')  - a3vx\y,)l  (2.31b) 

ai  a2(xf) 


where  the  integrand  of  course  represents  the  result  of  the  trivial 
z * -integration. 

The  density  function  defining  a uniform  distribution  of  points  inside 

ft  is 


P(x,y,z) 


1/ (ft | » for  (x,y,z)eQ 
0 , for  (x,y,z)£ft 


(2.32) 


To  apply  the  generalized  inversion  method  to  generate  random  points  ac- 
cording to  this  P(x,y,z),  we  must  evidently  condition  P(x,y,z)  in  the 
manner  of  (2.21).  This  is  not  in  general  a trivial  task,  because  of  the 
boundaries  of  ft.  Thus,  inserting  (2.32)  into  (2.22),  we  find  for  the 
one-variable  density  functions 

b2(x) 

P(x)  * | ft  I'*1  / dy’[b3(x,yf)  -a3(x,y’)],  ai^x^bi  (2.33a) 

a2  (x) 


^Eq.  (2.31a)  is  to  be  read  Mft  is  the  set  of  all  points  (x,y,z)  for  which 
ai^x^bi,  a2(x)^y^b2(x)  and  a3 (x,y)^z$b^ (x,y)’\ 
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P(y|x)  = [b3 (x,y)  - a3(x,y)] 


'b2  (x) 

J dy ' [b3 (x,yf)  - a*(xty')]9 
a2(x) 


a2 (x)£y^b2 (x) 


(2.33b) 


P(z | x,y)  = l/[b3(x,y)  - a3(x,y)],  a3 (x,y)^z^b3 (x,y)  (2.33c) 


The  corresponding  one-variable  distribution  functions  in  (2.23)  are 
therefore  given  by 

x b2  (x  * ) 

F(x)  = |ft|~*  J dxf  / dy 1 [b 3 (x f ,y ')  - a3(x’,y1)],  ai^x^bi  (2.34a) 
ai  a2(x') 


y 

F(y|x)  = / dy,[b3(x,y') 

a2(x) 


a2 


(x,y’)] 


b2  (x) 

J dy’ 

a2  (x) 


[b3 (x ,y * ) - a3  (x,y ') ] , a2(x)<y£b2(x) 


(2.34b) 


F(z|x,y)  = [z  - a3(x,y)]//[b3(x,y)  - a3(x,y)], 

a3(x,yKz$b3  (x,y)  (2.34c) 


Thus,  to  generate  a random  point  uniformly  inside  ft  by  the  generalized 
inversion  method,  we  must  insert  the  above  distribution  functions  into 
(2.29),  and  solve  successively  for  x^ , y^  and  z^.  Depending  upon  the 
shape  of  ft  — i.e.,  depending  upon  the  boundary  functions  d2(x),  b2(x), 

a3(x,y),  b3(x,y) — this  may  be  a very  easy  task  or  a very  difficult  task. 
The  easiest  case  is  realized  when  ft  is  a "box",  with  the  boundary 
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functions  a^  and  all  constants.  In  this  case  (2.31b)  gives 

|ft|  “ (bi-ai) (b2-a2) (b3-a3) 

The  one-variable  density  functions  in  (2.33)  become 


P(x)  - l/(bi-ai),  ai^x^bi 
P(y|x)  = l/(b2-a2),  a2^y<b2 

P(z|x,y)  = l/(b3-a3),  a3^z^b3 

and  the  corresponding  one-variable  distribution  functions  in  (2.34)  become 


F (x)  = (x-£  i ) / (bi  -ai  ) , ai$x$bi 
F(y|x)  = (y-a2)/(b2-a2),  a2SySb2 
F(z |x,y)  = (z-a3)/(b3-a3) , a3^z^b3 


Inserting  these  distribution  functions  into  (2.29)  and  inverting,  we 

obtain  the  following  algorithm  for  generating  a random  point 

from  a uniform  distribution  inside  the  box  {ai^x^bi ,a2^y^b2 3^z^b3) : 


= ai  + (bi  -ai  )r^ 
y^  - a2  + (b2-a2)r2i 

z^  = 33  ■+■  (b3-a3)r3i 

Here,  r^,  r2i  and  r3i  are  independent 
distribution  in  the  unit  interval.  Eq 


(2.35a) 

(2.35b) 

(2.35c) 

random  numbers  from  a uniform 
;.  (2.35)  are  precisely  what 
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we  should  expect  on  the  basis  of  the  rule  in  (2.10c):  we  simply  generate 

each  coordinate  independently  from  a uniform  distributicn  along  the  cor- 
responding edge  of  the  box. 

When  the  boundary  functions  a^  and  b^  are  not  all  constants,  so  that 
Cl  is  not  a box,  then  the  one -dimensional  distribution  functions  must  be 
calculated  according  to  (2.34),  inserted  into  (2.29),  and  inverted.  It  is 
important  to  realize  that,  in  this  general  case,  one  will  not  obtain 
equations  having  the  simple  form  (2.35).  That  is,  although  (2.34c)  and 
(2.29c)  will  indeed  produce  an  equation  like  (2.35c)  with  a 3 and  b3 
replaced  by  a3(x^,y^)  and  b3(x^,y^),  (2.34b)  and  (2.29b)  will  not  produce 
an  equation  like  (2.35b)  with  a?  and  b2  replaced  by  a2(x^)  and  b2(x^), 
and  (2.34a)  and  (2.29a)  will  not  produce  (2.35a).  To  put  it  differently, 
although  P(z|x,y)  in  (2.33c)  indeed  describes,  for  fixed  x and  y,  a 
uniform  distribution  in  z,  P(y|x)  in  (2.33b)  does  not  describe,  for  fixed  x, 
a uniform  distribution  in  y,  and  P(x)  in  (2.33a)  does  not  describe  a 
uniform  distribution  in  x.  The  point  here  is  that  the  correct  version 
of  the  algorithm  in  (2.35)  for  non-box  regions  Q cannot  be  easily 
intuited  a^  priori. 

It  is  in  principle  always  possible  to  apply  the  generalized  in- 
version method  to  generate  random  points  uniformly  inside  a given  region 
0,  provided  Cl  is  defined  by  means  of  boundary  functions  a^  and  b^  as 
in  (2.31a).  In  practice,  though,  the  calculation  of  the  one-variable 
distribution  functions  in  (2.34)  and  their  subsequent  inversion  often 
prove  to  be  prohibitively  difficult.  Furthermore,  it  often  happens 
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that  the  volume  ft  is  not  defined  through  boundary  functions  of  the 
kinds  in  (2.31a),  but  rather  through  one  or  more  inequalities  between 
various  functions  of  the  coordinates.  In  such  situations,  it  is  some- 
times feasible  to  proceed  in  the  following  alternate  way:  Choose  a 

box-like  region  E which  completely  encloses  the  given  region  ft.  Generate 
random  points  uniformly  inside  £ according  to  the  procedure  described 
in  connection  with  (2.35),  but  keep  only  those  points  which  happen  to 
also  lie  inside  ft.  Clearly,  the  subset  of  "kept"  points  will  be  dis- 
tributed randomly  and  uniformly  inside  ft.  This  simple  procedure  has  the 
advantage  that  one  can  apply  it  without  having  to  calculate  and  invert 
the  various  one -dimensional  distribution  functions.  Furthermore,  one 
does  not  even  need  to  know  the  boundary  functions  a^  and  b^  in  (2.31a); 
one  only  needs  to  be  able  to  decide  whether  or  not  a given  point  in  £ 
lies  inside  ft.  The  only  possible  drawback  to  this  method  is  its  ef- 
ficiency. Clearly,  the  approximate  fraction  of  uniformly  distributed 
random  points  inside  £ which  also  lie  inside  ft  will  be  the  ratio  of  the 
volumes,  | ft | / | £ | . If  this  ratio  is  very  small — i.e.,  if  ft  is  so  shaped 
that  its  volume  is  much  smaller  than  the  smallest  box  £ which  can  be 
fitted  around  ft — then  this  method  for  generating  random  points  uni- 
formly inside  ft  will  be  correspondingly  inefficient. 

We  shall  now  illustrate  the  foregoing  two  procedures  for  generating 
random  points  uniformly  inside  non-box  regions  by  considering  the  fol- 
lowing two-dimensional  problem:  Let  ft  be  the  region  in  the  xy-plane 

which  is  bounded  by  the  x-axis,  the  line  x^l,  and  the  curve  y^x11, 
where  n is  a fixed,  positive  integer.  A sketch  of  ft  is  shown  in  Fig.  4. 
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FIGURE  4.  An  example  of  a two-dimensional 
region  u. 
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Suppose  we  wish  to  generate  random  points  uniformly  inside  ft. 

One  way  of  proceeding  would  be  to  generate  random  points  uni- 
formly inside  the  unit  square  E,  and  then  keep  only  those  points  which 
happen  to  fall  inside  ft.  Letting  r^  and  be  two  independent  random 
numbers  from  a unifor.  distribution  in  the  unit  interval,  the  generating 
algorithm  is  evidently 


x 

y 


i 

i 


keep  only  if  y^x"  J 


(2.36) 


Since  the  volume  of  the  unit  square  is  | Z j — 1 and  the  volume  of  ft  is 


|ft|  = JdxJdy  = /xndx  = ™ (2.37) 

0 0 o 

then  the  efficiency  of  this  method  is 

l«|/|S|'^  (2-38) 

For  small  values  of  n this  method  would  net  be  too  bad;  e.g.,  for  n-1, 
ft  would  be  a simple  triangle,  and  half  the  points  generated  inside  Z 
would  be  kept.  However,  if  n is  very  large  this  method  would  evidently 
not  be  satisfactory.  Let  us  see  how  we  could  generate  the  points  inside 
ft  directly  using  the  generalized  inversion  method. 

We  wish  to  generate  random  points  (x^y^  according  to  the  density 
function 
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P(x,y)  = 


(2.39) 


1/ |Q| , for  (x,y)eQ 
0 , for  (x,y)#2 

where  'Cl  is  the  volume  shown  in  Fig.  4,  and  |ft|  is  given  by  (2.37). 

We  first  "condition"  this  density  function  in  the  form 

P(x,y)  = P(x)P(y|x)  (2.40) 

The  one-variable  density  functions  P(x)  and  P(y|x)  are  given  by 
[cf.  (2.22)] 


n 

x 


P(x)  = fp(x,y’)dyf  = (n+l)xn,  0$x{l 
0 

(2.41a) 

P(y|x)  = ^ (Ky?xn 

/P(x,yf)dy* 

o 

(2.41b) 

and  the  corresponding  one-variable  distribution  functions  are 

given  by 

[cf.  (2.23)] 

F(x)  =:  JP(x')dx’  = xn+1,  O^x^l 

o 

(2.42a) 

y 

F(y|x)  = /P(y * |x>dy ' = x ny,  0*y$x 

(2.42b) 

o 


Then,  with  r^  and  two  independent  random  rumours  from  a uniform 
distribution  in  the  unit  interval,  we  put  in  accordance  with  (2.29), 
rli=F^xi)  anc*  r2i=F^i  lxi^  * anc*  s°fve  successively  for  x^  and  y^.  The 
result  is  easily  found  to  be 

/ 
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r 1/ (n+1) 
li 


(2.43) 


Thus,  (2.43)  is  the  algorithm  whereby  one  directly  generates  random 
points  uniformly  inside  the  region  ft,  shown  in  Fig.  4,  for  any  fixed 
value  of  n. 

It  is  tempting  to  try  to  "improve"  the  algorithm  (2.36)  by  modifying 
its  second  equation  to  read 

yi  = xir2i 

This  would  amount  to  first  generating  a random  coordinate  x^  uniformly 
between  0 and  1,  and  then  a random  coordinate  y^  uniformly  between  0 
and  Xin  (rather  than  between  0 and  1).  Clearly  this  would  s tomatically 
satisfy  the  inequality  in  (2.36),  so  that  ever’'  point  (x^y^  generated 
in  this  way  would  lie  inside  ft.  The  trouble  with  this  procedure  is  that 
the  points  (x^,y^)  generated  in  this  way  would  not  cover  ft  uniformly. 

To  see  this,  we  need  only  observe  that  this  method  would  produce  as  many 
points  with  x^l/2  as  with  x^H/2,  implying  that  the  portion  of  ft  in  Fig.  4 
to  the  left  of  the  line  x~l/2  would  contain  just  as  many  points  as  the 
portion  of  ft  to  the  right  of  this  line — a situation  clearly  inconsistent 
with  a uniform  distribution.  The  only  way  of  first  generating  an 
x^-value  and  then  generating  a y^-value  such  that  (x^,y^)  is  always  a 
random  point  from  a uniform  distribution  inside  ft,  is  to  proceed 
according  to  the  algorithm  (2.43). 


44 


A more  elaborate  example  cf  using  the  generalized  inversion  method 
to  generate  random  points  uniformly  inside  a given  region  will  be 
presented  in  Section  2-10. 

2-7 . The  Generalized  Rej ection  Method 

In  the  preceding  section  we  showed  how  one  could  generate  random 
points  uniformly  inside  a given  region  2.  Having  this  ability,  it  is 
possible  to  generate  random  points  inside  ft  according  to  any  prescribed 
density  function  P(x,y,z)  by  a straightforward  generalization  of  the 
one-dimensional  rejection  method: 

Generalized  Rejection  Method : We  are  given  a density  function 

P(x,y,z)  which  vanishes  everywhere  outside  a specified  region  21,  and 
which  is  bounded  by  a number  B inside  ft.  We  require  a set  of  random 
points  ((x!,y’,zp}  distributed  uniformly  over  ft,  and  also  an  in- 
dependent set  of  random  numbers  {r^}  distributed  uniformly  over  the 
unit  interval.  To  generate  a random  point  (x^,y^,z^)  according  to 
the  density  function  P(x,y,z),  draw  successive  pairs  of  random 
points  r_^  and  (x!,y!,z!)  until  the  inequality 

P(x!,y!,zp/B  * r.  (2.44) 

is  found  to  be  satisfied,  whereupon  take  (xi,yi,zi)-(x|,y^,z !)  . 

The  proof  for  this  method  is  a straightforward  generalization  of  the 
proof  in  one  dimension,  which  is  given  in  Appendix  A.  As  with  the  one- 
dimensional case,  it  should  be  noted  that  it  is  only  necessary  to  know 
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P(x,y,z)  and  its  upper  bound  B to  within  a constant  factor,  because 
only  their  ratio  is  used.  In  any  case,  the  efficiency  of  this  method 
is  [cf.  (2.12)] 


JJ/qP (x,y ,z)dxdydz 


(2.45) 


so  it  is  desirable  to  take  B equal  to  the  least  upper  bound  P(max)  of 
P(x,y,z)  in  ft. 

It  may  be  noted  that  the  alternate  technique  mentioned  in  the 
previous  section  for  generating  random  points  uniformly  in  a non-box 
region  ft — namely,  by  picking  from  a uniform  distribution  inside  an 
enclosing  box  Z all  those  points  which  happen  to  fall  inside  ft — is 
really  an  application  of  the  generalized  rejection  method.  Thus,  we 
start  with  a set  of  random  points  ((x|,y’,z^)}  distributed  uniformly 
inside  a box  Z which  encloses  the  given  region  ft  , and  we  proceed 
to  construct  a set  {(x.,y^,zi)}  distributed  according  to  the  density 
function  P(x,y,z)  in  (2.32).  The  least  upper  bound  for  P(x,y,z)  in 
(2.32)  is  evidently  B = 1/ | ft | , so  the  ratio  on  the  left  of  (2.44)  will 
be  1 if  (x|,y^,z|)  Z ft  and  0 if  (x*,y|,zj)  t ft.  In  the  former  case 
the  inequality  in  (2.44)  will  always  be  satisfied  and  the  trial  point 
will  be  kept,  while  in  the  latter  case  the  inequality  in  (2.44)  will 
never  be  satisfied  and  the  trial  point  will  be  rejected.  In  this  case 
there  is  never  any  need  to  draw  a random  number  r^:  the  acceptance  of 
the  trial  point  depends  ultimately  only  on  whether  it  lies  inside  ft. 
The  efficiency  of  this  method  is  calculated  from  (2.45)  by  replacing 
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1 by  E,  inserting  for  P(x,y,z)  the  function  in  (2.32)  and  putting 
B«l/|fl|;  thus, 

p i Ini 

“ <1/|£2|)-|E|  |Ef 

just  as  we  expect. 

If  the  set  ( (x^I  ,y^,zp}  used  in  the  generalized  rejection  method  is 
distributed  over  ft  according  to  a (not  necessarily  uniform)  density  function 
P(x,y,z),  then  the  density  function  of  the  set  C(x^,y^,z^)}  constructed  in 
accordance  with  the  selection  rule  (2.44)  would  be  CP(x,y ,z)P(x,y ,z) , C 
being  the  appropriate  normalization  constant.  This  follows  from  a straight- 
forward extension  to  three  dimensions  of  the  arguments  presented  in 
Appendix  A. 

2-8 . The  Contraction  Method 

- We  have  discussed  two  general  ways  of  generating  random  points  according 
to  a prescribed  probability  density  function — namely,  the  inversion  method 
and  the  rejection  method.  We  shall  now  describe  one  more  method,  which  we 
shall  call  the  "contraction  method",  for  accomplishing  this  task.  This 
method  is  applicable  whenever  the  given  density  function  can  be  regarded 
as  a contracted  density  function  of  some  higher  dimensional  distribu- 
tion which  can  be  easily  handled.  In  its  simplest  form,  the  contraction 
method  can  be  described  as  follows: 

Contract  ion  Method : It  is  desired  to  generate  a set  of  random  points 
{x^}  according  to  a given  density  function  P(x),  but  it  is  found  that 
neither  the  inversion  nor  rejection  method  offers  an  efficient  way 
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of  doing  this.  However,  it  is  discovered  that  there  exists  a 
density  function  P(x,y)  for  which  P(x)  is  the  y-contracted  density 
function: 

00 

P(x)  = /P(x,y ' )dy'  (2.46) 

— 00 


It  further  happens  that,  by  using  either  the  generalized  inversion 
method  or  the  generalized  rejection  method,  it  is  possible  to 
generate  random  pairs  {(x_^,y^)}  according  to  P(x,y)  rapidly  and 
efficiently.  Then,  by  generating  such  a set  {(x^,y.)}  and  simply 
ignoring  the  y-coordinates , we  have  by  (2.46)  a set  of  x-eoordinates 
{x_^}  which  are  randomly  distributed  according  to  P(x)  . 


We  may  illustrate  the  potential  usefulness  of  the  contraction 
generating  method  by  considering  the  following  example.  Suppose,  it  is 
desired  to  generate  a set  of  random  numbers  (y ^ } distributed  according 
to  the  density  function 


P(y) 


(0+1) [1  - y1/nl . 


for  O^y^l 


(2.47) 


[ 0,  for  y<0  and  y>l 

where  n is  some  fixed,  large  integer.  For  the  inversion  method,  we  can 
calculate  the  distribution  function  easily  enough, 


F(y) 


X 

! P(y')d3 


= y(l  + n - ny1^] 


(2.48) 


o 

but  we  observe  that  the  equation  r^=F(y^)  can  be  inverted,  as  required 
by  (2.9),  only  numerically.  The  rejection  method  would  entail  picking 
a pair  of  random  numbers  y^  and  r^  uniformly  in  the  unit  interval,  and 
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[noting  that  the  least  upper  bound  on  P(y)  is  B=(n+1)]  taking  yj  to  be  a 
member  of  the  desired  set  {y^}  if  and  only  if  [cf.  (2.11)] 

P(yp/B  - 1 - y|1/n  * r£  (2.49) 

However,  the  efficiency  of  this  method  is  easily  calculated  from  (2.12) 
to  be  E-l/(n+l),  which  is  very  low  under  the  given  specification  that 
n is  large. 

We  now  astutely  observe  that  the  density  function  P(y)  given  in 
(2.47)  coincides  with  the  x -contracted  density  function  of  the  quantity 
P(x,y)  defined  in  (2 .39) , (2 .37)  and  Fig.  4:^ 

P(y)  = / P(x',y)dx'  = / (n+l)dx'  = (n+1) (l-y1^"] 

yl/n 

Now  we  have  already  found  that  the  algorithm  in  (2.43)  offers  a very  ef- 
ficient way  of  generating  random  points  ((x^y.)}  according  to  P(x,y),  even 
if  n is  very  large.  Therefore,  by  first  generating  a number  x^  according  to 
the  first  of  Eqs.  (2.43),  and  then  using  this  x^-value  to  generate  a number 
y^  according  to  the  second  of  Eqs.  (2.43),  we  will  have  thereby  generated 
a y^-value  according  to  the  density  function  in  (2.47).  Of  course,  we  have 
had  to  use  two  random  numbers  to  do  this,  but  this  is  still  a more  efficient 


'Notice  in  passing  that  the  functional  forms  of  P(y)  in  (2.47;  and  P(x) 
in  (2.41a)  are  indeed  quite  different,  even  though  both  are  contracted 
from  the  same  two-dimensional  density  function  P(x,y). 


49 


method  for  large  n than  is  offered  by  either  the  inversion  or  rejection 
methods. 

Variations  on  the  contraction  method  are  seen  to  be  virtually 
limitless.  For  example,  one  might  find  that  it  is  a simple  matter 
to  generate  a set  of  random  points  {(x^y^z^)}  according  to  a density 
function  P(x,y,z)  by  applying  the  generalized  inversion  method,  con- 
ditioning P(x,y,z)  as  P(x) *P(y|x) *P(z |x,y)  [cf.  (2.29)].  Then,  by  ig- 
noring the  x-  and  y-  coordinates  we  have  available  a set  of  random 
numbers  {z^}  distributed  according  to  the  density  function 

oc  oo 

P(z)  = Jdx'/dy'PCx'.y'.z)  (2.50a) 

-OO  -OO 

and  by  ignoring  the  x-coordinates  have  available  a set  of  random  points 
((yi,Zi)}  distributed  according  to  the  density  function 

P(y,z)  = /dx’P(x’,y,z)  (2.50b) 

-OO 

and  so  on. 

We  thus  have  at  our  disposal  a variety  of  techniques  which  can  be 
used,  in  conjunction  with  a given  set  of  random  numbers  (r^J  distributed 
uniformly  in  the  unit  interval,  to  construct  a set  of  random  points  (x_. 
distributed  according  to  any  prescribed  density  function  P(x).  In  th ^ 
next  chapter  we  shall  see  how  to  use  such  sets  of  points  to  numerically 
estimate  definite  integrals.  We  conclude  the  present  chapter  by  con- 
sidering two  examples,  of  interest  in  both  statistics  and  statistical 
mechanics,  which  illustrate  some  cf  the  ways  in  which  one  can  utilize 
the  random  number  generating  techniques  developed  in  this  chapter. 
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2-9  An  Example:  The  Weighted  Gaussian 

Consider  the  problem  of  generating  a set  of  random  numbers  {x^ 
distributed  according  to  the  density  function 


P(x;n,a) 


A(n,a)  xnexp(-axz), 
0 


x*0 

x<0 


(2.51a) 


where  n is  any  fixed  non-negative  integer  and  a is  any  fixed  positive 
number.  The  constant  A(n,a)  is  defined  so  that  P(x;n,a)  satisfies  the 
normalization  condition  (2.3);  using  standard  integral  tables  one  finds 


» 


A(n,a) 


| 2 2n/2a(n+l)/2 
/tt  1*3»5**  • (n-1)  * 


2a(n+1)/2/(n^l)!  > 


n=0 


n-2,4,6,. . . 
n=l  ,3 ,5 , . . . 


(2.51b) 


For  n~0,  we  have  P(x;0,a)=2/a7?exp(-ax2) , x^O,  which  is  often  referred  to 
as  the  Gaussian  curve.  [More  precisely,  the  Gaussian  curve  is  usually 
defined  as  /a /It  exp  (-ax2)  on  the  entire  x-axis,  so  our  P(x;0,a)  is 
really  just  half  of  the  Gaussian  curve.]  By  including  the  factor  xn, 
n>0,  we  obtain  what  we  shall  term  a "weighted  Gaussian".  It  is  easy 
to  show  that  P(x;n,a)  assumes  its  maximum  value  at  the  point  x=/n/2a; 
furthermore,  for  n^l,P(x;n,a)  tends  to  0 as  x->-0,  and  for  all  n,P(x;n,a) 
tends  to  0 as  x-*30. 

If  we  wish  to  generate  a set  of  random  numbers  (xj  according  to 
P(x;n,a)  by  the  ordinary  inversion  method,  we  must  first  calculate  the 
distribution  function 
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X 

F(x;n,a)  3 /p(x ' ;n,a)dx* 
o 

This  calculation  is  rather  lengthy  for  arbitrary  n,  and  is  found  to 
yield 


erf (/ax) 


F(x;n,a)  erf(/Ix)  /ax  exp(-ax2)  l 

(n-l)/2 

1 - exp (-ax2)  l 

v*0 


, n*0 

3 *5*  • • (2V-1) , n=2 ,4 , . 


n/2  (2ax2)V“1 


(2.52) 


2\V 


(axz) 

v! 


n=l,3,.. 


where  erf (x)  is  the  so-called  "error  function", 

2 X 

erf(x)  =*  Jexp(-t2)dt , x^O  (2.53) 

v o 

which  is  tabulated  in  many  mathematical  handbooks.  It  is  clear  from  (2.52) 
that  the  task  of  inverting  F(x;n,a)  is  in  general  not  a trivial  matter. 

This  is  particularly  true  for  n=0,2,4,...,  since  erf(x)  can  be  calculated 
and  inverted  only  by  numerical  methods.  There  is  in  fact  only  one  case  for 
which  F(x,n,a)  can  be  easily  handled.  This  is  the  case  n=l,  for  the 
equation 


ri  = F(x. ;l,a)  * 1 - exp(-ax2) 
can  be  easily  inverted  to  obtain 


(2.54a) 


as  the  algorithm  whereby  one  constructs  from  a set  of  random  numbers 
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{r^}  distributed  uniformly  in  the  unit  interval,  a set  of  random  numbers 
{x^}  distributed  according  to  the  density  function 

P(x;l,a)  * 2ax  ^xp(-ax2),  x^O  (2.54b) 

A straightforward  application  of  the  rejection  method  to  P(x;n,a) 
is  not  very  satisfactory  because  P(x;n,a)  is  non-zero  over  an  infinite 
interval.  Of  course,  we  might  simply  put  P(x;n,a)=0  for  all  x larger 
than  some  large  but  finite  value  xq,  but  this  procedure  is  rather  arbi- 
trary. Moreover,  the  efficiency  of  the  rejection  method  is  inversely 
proportional  to  the  length  of  the  interval  (a,b)=(0,xo)  over  which  the 
initial  uniform  set  {xp  is  taken  [see  (2.12)],  so  the  larger  we  take 
x the  more  inefficient  the  rejection  method  becomes. 

We  shall  now  derive  two  different  methods  for  efficiently  generating 
random  numbers  {x^}  according  to  the  n=0  density  function, 

P(x;0,a)  - 2/a/tf  exp(-ax2),  x^O  (2.55) 

We  shall  then  show  how  one  can  easily  construct,  from  a given  set  of 
random  numbers  {x^}  distributed  according  to  P(x;0,a),  another  set  of 
random  numbers  fp^}  distributed  according  to  P(p;n,a)  for  any  integer 
n>0. 

The  first  method  for  generating  random  numbers  {xA  according  to 
P(x;0,a)  essentially  consists  of  a combination  of  the  contraction  and 
inversion  methods,  coupled  with  a suitable  transformation  of  variables. 
Consider  the  auxiliary  two-dimensional  density  function  P(x,y),  defined  by 
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P(x,y)  - P(x;0,a) *P(y;0,a) 

4^exp(-a[x2+y2]). 

S 

0 

Clearly,  the  contracted  density  functions  P(x)  and  P(y)  are 

00 

P(x)  E Jp(x.y')dy’  - P(x;0,a)  (2.57a) 

o 

oo 

P(y)  i /P(x’,y)dx’  = P(y;0,a)  (2.57b) 

o 

so  that  if  we  can  generate  random  pairs  {(x^y^)}  according  to  P(x,y) 
then  the  separate  coordinate  sets  {x^}  and  {y^}  will  each  be  a set  of 
random  numbers  distributed  according  to  the  desired  density  function. 
Moreover,  since 

P(x | y)  E P(x,y)/P(x)  = P(x;0,a)  * P(x)  (2.58a) 

and 

P(y|x)  E P(x,y)/P(y)  = P(y;0,a)  = P(y)  (2.58b) 

then  for  any  random  point  (x^,y^),  a knowledge  of  x^  tells  us  nothing  about 
the  possible  values  of  y^;  in  other  words,  the  sets  {x^}  and  {y^}  derived 
from  the  set  {(x^,y^)}  are  statistically  independent  of  each  other.  Now, 
how  can  we  obtain  a set  ((x^y^)}  distributed  according  to  P(x,y)  in 
(2.56)?  Consider  the  transformation  of  variables  (x,y)+(p,0)  defined  by 


for  x,y*0 
for  x<0  or  y<0 


(2.56) 


X * pcos0 
y = psin0 


(2.59) 


Since  3 (x,y)/3 (p,0)*p,  then  a distribution  of  random  pairs  ((x^y^) 
with  the  density  function  P(x,y)  in  (2.56)  corresponds,  under  the 
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transformation  (2.5?),  to  a distribution  of  random  pairs  with 

density  function  [cf.  (2 .25)— (2.28) ] 


?(p,9)  = P(x,y)  j|^$|  = A^p  exp(-ap2)  (2.60) 

where  0£p<°°  and  O£0£Tr/2.  Conditioning  I5 (P , in  the  form  f*(p)  *?  (0 1 p)  , 
we  find 

tt/2 

P(p)  E [PCpjQ^dQ'  = 2ap  exp(-ap2 ) I P(p;l,a)  (2.61a) 


?(0|p)  = P(pf8)/P(p)  = 1/ (tt/2) 


(2.61b) 


Now,  we  have  already  seen  how  to  generate  random  numbers  p^  according  to 
P(p;l,a)  [cf.  (2.54)];  furthermore,  1*  is  trivial  to  generate  random  numbers 
0^  in  (0,tt/2)  according  to  the  density  function  in  (2.6.1b)  [cf.  (2.1 0)  ] . 
Hence,  it  is  a simple  matter  to  generate  random  pairs  ((o^^))  according 
to  P(p,$).  Our  algorithm  for  generating  random  numbers  {x^}  according 
to  P(x;0,a)  is  therefore  as  follows:  Letting  r^  and  r9^  denote  two  random 

numbers  from  a uniform  distribution  in  the  unit  interval,  calculate 
[cf.  (2.54)  and  (2.10)] 


(2.62a) 


(2.62b) 


Then  calculate,  in  accordance  with  (2.59), 
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wiri™nu«inrg^ 


Xi  * Picos9i 


(2.63a) 


- Pisin0i  (2.63b) 

The  random  pairs  ((x^,y^)}  generated  in  this  way  will  evidently  be 
distributed  according  to  the  density  function  P(x,y)  in  (2.56).  There- 
fore, by  (2.57),  the  set  {x^}  will  be  distributed  according  to  the  density 
function  P(x;0,a)  and  the  set  {y^}  will  be  distributed  according  to  the 
density  function  P(y;0,a).  Moreover,  because  of  (2.58)  the  sets  (x^  and 
{y^}  are  statistically  independent . so  that  the  numbers  x^  and  y^  calculated 
from  the  same  and  0^  in  (2.63)  can  be  used  successively  without  intro- 
ducing unwanted  correlations.  Note  that  this  generating  method,  which 
operationally  involves  nothing  more  than  the  formulae  in  (2.62)  and  (2.63), 
is  actually  100%  efficient,  in  that  the  two  random  numbers  r^  and 
from  a uniform  distribution  in  the  unit  interval  actually  produce  two  random 
numbers  distributed  according  to  the  desired  density  function  P(x;0,a) . 

[Note  also  that  the  quantity  (1-r^)  in  (2.62a)  can  be  replaced  by  r^, 
since  both  are  uniformly  distributed  random  numbers  in  the  unit  interval.] 

We  next  consider  an  alternate  method  of  generating  random  numbers 
{x^}  according  to  the  density  function  P(x;0,a).  This  method  consists  of 
first  introducing  a change  of  variables  x+y  which  transforms  the  infinite 
range  (Kx<°°  into  the  finite  range  0<y^l,  and  then  applying  to  the  trans- 
formed density  function  the  one-dimensional  rejection  technique.  [This 
method  is  adapted  from  Fluendy,  Ref.  2,  p.  77.]  The  x+y  transformation 
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used  here  is 


-/ax 

y * e or  x 


-vf 


logy 


(2.64) 


Under  this  transformation  a set  of  random  numbers  {x^}  distributed  over 
the  interval  (Kx<°°  according  to  P(x;0,a)  corresponds  to  a set  of  random 
numbers  {y^}  distributed  over  the  interval  0<y<l  according  to  the  density 
function  [see  (2.5)-  (2.7)3 


P(y)  - P(x;0,a)|iL 


Using  (2.64)  and  (2.55),  we  easily  find 


P(y)  = y'1exp(-log2y),  0<y£l 

It  is  not  difficult  to  show  that  P(y)  assumes  its  maximum  value  at 
y=l/ ✓e',  and  that  this  maximum  value  is 


(2.65) 


B-7felA 


(2.66) 


Hence,  va  can  generate  a random  y^- value  according  to  P(y)  by  repetitively 
drawing  pairs  of  random  numbers  y^  and  from  a uniform  distribution  in 
the  unit  interval  until  the  inequality  [cf.  (2.11) J 


P(yJ)/B  > 


or  equivalently 


(|  + logy;)  v<  log(^) 


(2.67) 
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is  found  to  be  satisfied.  We  then  take  y^y*,  and  put 


ci  “ 


(2.68) 


in  accordance  with  the  transformation  (2.64).  The  efficiency  of  this 
method— i.e.,  the  fraction  of  the  y^-values  which  lead  to  acceptable 
y^-values,  and  hence  acceptable  x^-values — is  found  from  (2.12)  to  be 

F = |P(y)dy  1 /ft 


B*  (1-0)  = B = 2 eV*» 


- 0.69 


(2.69) 


This  efficiency  is  quite  satisfactory;  it  implies  that  roughly  2 out 
of  every  3 y ^-values  tried  will  be  accepted. 

We  thus  have  available  two  methods  of  rapidly  and  efficiently  gener- 
ating random  numbers  {x^}  according  to  the  density  function  P(x;0,a). 

We  shall  now  show  how  one  may  use  random  numbers  distributed  according 
to  P(x;0,a)  to  construct  random  numbers  distributed  according  to  P(x;n,a) 
for  any  integer  n>0.  The  method  is  operationally  quite  simple:  If 

Xli,X2i* * * * ,Xn+l  i are  num^ers  drawn  at  random  from  a set  {x^}  whose 
density  function  is  P(x;0,a),  then 


= /x^+x‘ ,+. . .+x 


*ii- 


C2i ' 


2 

n+1 , i 


(2.70) 


will  be  a random  element  from  a set  {p^}  whose  density  function  is  P(p;n,a). 

To  prove  the  last  statement,  consider  the  (n+1) -dimensional  density 
function 


P(xi ,x2 , . 


n+1 

••xn+l>  ETTP(xj;0,a) 


9n+l/a\  (n+l)/2  . T z 2 

2 (-)  exp(-a[x?+...x^+1]). 


if  any  x^<0 


if  all  x.^0 

1 (2.71) 
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We  may  generate  a random  point  (x^^»x2i*  ’ ‘ * »Xn+l , accorc*in&  to 
density  function  merely  by  picking  each  component  independently  according 
to  the  density  function  P(x;0,a);  this  follows  because,  as  may  be  readily 

seen  from  (2.71), 

P(xi)  = P(xi ;0,a) 

P (x2 | xi ) = P(x2;0,a) 

P (x  3 | Xi  ,X2 ) = P(X3;0,a) 
etc . 


Consider  next  the  transformation  of  variables  (xi  ,x2 , . . . ,xn+1>+(p,ai  ,a2  , ,..,0^) 
which  corresponds  to  a change  from  the  Cartesian  respresentation  of  an 
(n-H) -dimensional  vector  to  a polar  representation.  Here,  p is  the  "length 
of  the  n+1  dimensional  vector,  while  the  a.’s  are  certain  angles  or  cosines 
thereof.  For  example,  for  n=l  we  have  Oti=0,  with 


o . ? 3 (xi  ,x2)  _ 

. p2  = x?  + Tc^eT  p 


xi  = pcosG 
x2  = psinQ 

and  for  n=2  we  have  oti=cos0  and  a2=<t>,  with 


xi  - psin0cos({) 
x2  - psin0sin4> 
x 3 = pcos(J> 


? 2 t 2 i 2 3 (x  1 ,X2  ,Xj)  _ -i 2 

, p2  = Xl  + X2  + X3,  3 (p,COS0 ,$)  P 


In  general,  the  transformation  we  consider  has  the  properties  that 


P2  = XI  + X2  + . . •+  X‘+1 


(2.72a) 


and 
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3(*1 ,X2, 


(2.72b) 


• • • 'Vl>  n 

= p 

3(p,a1....,an) 

From  (2 . 25)  — <2 . 28)  it  follows  that  a set  of  random  (n+1) -tuples 
{ (xii»x2i»  • • • »xn+i  distributed  according  to  the  density  function 
P(x^,X2» . . . *xn+^)  corresponds  to  a set  of  random  (n+l)-tuples 
^i,ali*  * • * ,otni^  distributed  according  to  the  density  function 


f(p,oi1,...,an)  = P(Xl x„+1) 


3(x1,x2,...,xn+1) 
= 2n+1(|)(n+1)/2  exp(-ap2)pn 


(2.73) 


-C(n,a)pnexp(-ap2) 


The  fully  contracted  p-density  function  is  therefore 


?(p)  * /da1.../danP(p,a1,...,an) 

= C * (n , a ) pnexp ( -ap 2 ) 

P(p)  * A(n,a)pnexp(-ap2)  = P(p;n,a)  (2.74) 


Here,  the  second  equality  follows  from  the  fact  that  f^p,^, . . . ,C^)  is 
independent  of  each  ot^,  and  the  last  equality  follows  by  simply  recognizing 
that  P(p)  must  in  any  case  be  correctly  normalized.  Hence,  we  have  shown 
that  the  quantity  p defined  in  (2.72a)  is  distributed  according  to 
P(p;n,a).  This  establishes  the  simple  construction  algorithm  (2.70). 

Actually,  the  algorithm  (2.70)  is  merely  a generalization  of  a 
familiar  result  in  statistical  mechanics:  For  gas  molecules  in  thermal 

equilibrium,  each  Cartesian  component  of  the  molecular  velocity  v 
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is  distributed  according  to  the  density  function  exp(-mv2/2kT) ; con- 

1/2 

sequentiy  the  molecular  speed  v*(v2-W2-W2)  is  distributed  according 
to  v2exp(-mv2/2kT) . 

In  conclusion,  we  see  that  we  may  generate  random  numbers  {x^} 
according  to  the  weighted  Gaussian  function  P(x;n,a)  in  (2.51)  either 
by  numerically  inverting  the  distribution  function  F(x;n,a)  in  (2.52), 
or  by  first  generating  random  numbers  according  to  P(x;0,a)  via  either 
(2.62)-(2.63)  or  (2 .67)-(2.68)  and  then  using  (2.70). 

2-10.  An  Example : Uniform  Distribution  of  Non-over lapp ing  Rods  on  a_  Line 

Consider  the  problem  of  distributing  N line  segments  or  "rods’',  each 
of  length  a,  randomly  and  uniformly  inside  the  x-axis  interval  (0,L), 
subject  to  the  constraint  that  none  of  the  rods  overlap.  We  assume  that 

L > Na  (2.75) 

so  that  the  interval  (0,L)  is  indeed  large  enough  to  accommodate  all  the 
rods . 

One  way  of  proceeding  on  this  problem  would  be  to  scatter  the  rods 
randomly,  uniformly  and  independently  inside  the  interval  (0,L)  until  we 
come  by  chance  upon  a configuration  in  which  none  of  the  rods  overlap. 

In  this  approach,  we  first  draw  N random  numbers  from  the 

uniform  distribution  in  the  unit  interval,  and  we  tentatively  locate 
the  center  of  rod  k at 


61 


k«l,2,.... 


(2.76a) 


N 


The  resulting  configuration  is  then  accepted  if  it  is  found 

to  satisfy  the  no-overlap  condition 


|x,-x.  | > a,  all  k^j  (2.76b) 

* J 

If  this  condition  is  not  satisfied  then  the  configuration  is  rejected  [the 
entire  configuration,  not  just  those  x^’s  which  are  found  to  violate  (2.76b)], 
and  we  must  try  again  using  a different  set  of  random  numbers  r^ , . . . ,r^ 
from  the  uniform  distribution  in  the  unit  interval.  This  procedure  is 
feasible  if  it  turns  out  that  a reasonable  fraction  of  the  configura- 
tions generated  in  (2.76a)  actually  satisfies  (2.76b).  As  we  shall  prove 
later  [cf.  (2.96)],  this  fraction  is  in  fact  given  by 


r acceptable  configurations 
trial  configurations 


(2.76c) 


For  N=100,  a=l  and  L=200  this  acceptance  ratio  is  (100/199)^^=1.3*10 
which  is  clearly  too  small  by  any  standard. 

Since  the  simple  rejection  generating  method  just  outlined  is  not 
generally  feasible,  let  us  try  to  devise  an  algorithm  based  on  the  in- 
version generating  method.  First,  though,  let  us  restate  the  problem  in 
a way  which  shows  clearly  that  we  are  An  fact  trying  to  generate  a "point” 
randomly  and  uniformly  inside  a given  "region". 

Imagine  the  rods  to  be  laid  out  on  the  x-axis  in  the  interval  (0,L) 
in  any  non-overlapping  configuration.  Let  the  rods  be  numbered  from 
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right  to  left,  so  that  the  nearest  rod  to  the  left  of  rod  k is  always 
rod  kH,  and  let  locate  the  center  of  rod  k [see  Fig.  5].  Now 
regard  the  N variables  x^^,....^  as  Cartesian  coordinates  in  an  N- 
dimensional  hyperspace.  Any  point  in  this  hyperspace  specifies  through 
the  values  of  its  coordinates  a "configuration"  of  the  rods;  however, 
not  every  point  in  this  space  will  satisfy  the  requirements  that  the  rods 
be  non-overlapping  and  that  the  rods  be  numbered  in  order  from  right  to 
left.  Let  ft  be  defined  as  the  set  of  all  points  (x^^, . • . .x^)  in  the 
N-dimensional  configuration  space  which  cto  satisfy  these  two  requirements 
thus,  ft  is  defined  by  [see  Fig.  5] 

::  = {(xl,x2>...,xN)||  xk+1+  a < xk  < xk_1-  a,  k=l N}  (2.77) 

where  x and  x... , are  defined  by 
o N+l  y 


: = L + a/2 

o 

(2.78) 

N+l  ~ ~a^2 

(2.79) 

With  (2.78)  and  (2.79)  the  conditions  x^<xQ-a  and  x^+a<x>J  in  (2.77) 
become  respectively 

x^  < L - a/2  and  xN  > a/2 

which  conditions  evidently  insure  that  rod  1 lies  inside  the  right 

boundary  and  rod  N lies  inside  the  left  boundary  [see  Fig.  5]. 

Simply  stated,  our  problem  is  to  generate  a point  randomly  and 

uniformly  inside  the  N-dimensional  region  ft;  that  is,  we  wish  to  generate 

random  N-tuple  (x. , . . . ,x  ) according  to  the  density  function 
1 N 
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p<v---*v  = 


(2.80) 


I ft  I?  if  (x1,...,xN)efi 
0 , if  (x1,...,xN)^ 

where  |ft|  is  the  volume  of  the  region  ft.  Our  procedure  will  be  to  use 
the  generalized  inversion  method  as  described  in  Sections  2-5  and  2-6. 

For  this  we  shall  first  need  to  conduct  a detailed  analysis  of  the  mathe- 
matical properties  of  the  region  ft. 

Consider  first  the  variable  xN*  From  Fig.  5 it  is  clear  that  the 
minimum  possible  value  for  is  a/2^xN+^+a;  the  maximum  possible  value 
for  x^  occurs  when  the  other  (N-l)  rods  are  jammed  against  the  right 
wall,  and  is  L-(N-l)a-a/2=LN.  For  any  given  x^  in  the  interval  (x^^+a,!^), 
the  minimum  possible  value  for  x^  ^ is  x^+a;  the  maximum  possible  value 
for  x^  ^ occurs  when  the  remaining  (N-2)  rods  are  jammed  against  the 
right  wall,  and  is  L-(N-2)a-a/2ELN_^ . Continuing  T-dth  this  line  of 
reasoning,  we  see  that  the  volume  ft  defined  in  (2.77)  can  also  be  specified 
in  the  following  way: 

ft  = {(x1,.  .*,xN)  ||  xk+1  + a < xk  < Lk,  k-1 , . . . ,N>  (2.81) 

where  the  constants  are  given  by 

Lk  E L - (k-l)a- a/2,  k-1, 2, . . .,N+1  (2.82) 

or  equivalently  by  the  recursive  formulae 

Lx  = L - a/2  (2.83a) 

Lk+1  = Lk  " a’  k=1*2’-">N  (2.83b) 
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[We  allow  (2.82)  and  (2.83)  to  define  the  quantity  which,  although 

not  appearing  in  (2.81),  will  be  convenient  for  later  formulae.] 

The  advantage  of  (2.81)  over  (2.77)  is  that  it  "orders"  the  coordinates 
in  the  manner  of  (2.31a),  thereby  allowing  us  to  employ  the  techniques 
outlined  in  Sec.  2-6. 

The  volume  |ft|  is  given  by  the  N-fold  integral 

Hi  Hi-i  li 

lnl  = / dxN  / dxs-i  "•f  dxi  (2,84) 

xN+l+a  Va  x2+a 


The  unconditioned  density  function  for  x^  is  given  by  [cf.  (2.22a), 
(2.80)  and  (2.81)] 


LN-1  LN-2  L1  , 

P(V  = ' dxfl-l  ' dxN-2"’  f dxi  1^1 

XN+a  XN-l+a  ‘ X2+a 


(2.85a) 


The  density  function  for  x^  conditioned  on  xk+2’  ****  XN*  ^or 

2<k£N-l  is  given  by  [cf.  (2.22b),  (2.80),  (2.81)] 


-k-l 


P(xlJXk+l’  ' 


x ) = *k+? 

' • ’ N;  L.  L 


/ dxk-l*--/  dxllfil 


-1 


X^+a 


(2.85b) 


k "k-1  “1 

/ / dxk_1---J  d51!  1^1 

xk+l+a  Xk+a 


-1 


x2+a 


And  the  density  function  for  x^  conditioned  on  x^x^...^  is  given  by 
[cf.  (2.22c),  (2.80)  and  (2.81)] 


66 


E 


L* 


$ 

r 


p(*jJx2 V = L 


M 


-1 


(2.85c) 


I1 


-1 


x^+a 


In  order  to  calculate  the  foregoing  quantities,  it  is  convenient  to  introduce 
the  auxiliary  quantities  Vq,  V^,  . . . , Vn  defined  by 


voE1 


(2.86a) 


Lk  Lk-1  L1 

= / dxk  / dxk-i"-  / dxi’ 


k-l,2,...,N 


(2.86b) 


x.  +a 
k 


x2+a 


In  terms  of  the  quantities  Vk  we  have  from  (2.84) 

Ini  - v. 


N 


(2.87) 


and  from  (2.85a)  - (2.85c) 


P(xkl*k+1 V * WV  k’1*2 N 


(2.88) 


provided,  for  k^N,  P (x^ | , . . . ,xN)  is  understood  to  represent  P(xN), 
Next  we  shall  derive  an  explicit  formula  for  so  that  the  important 
quantities  above  can  be  calculated. 


For  k=l  we  have 


V1  = / dx1  = - (x2+  a)  = (L1~  a)  - x2  » L2-  x2 


x2+a 


where  in  tne  la&L  step  we  invoked  (2.83b).  Thus, 


V1  = IT (L2- 


Now  suppose  that,  for  any  k^l,  Vk  is  given  by 
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(2.89) 


k! 


! (Lk+r 


k+l; 


Then  from  (2.86b)  we  have 


or 


Jk+1 


H+l 


v<  ..  = / dxk+iVk  ” kj  / <Lk+l~  Xk+1)  d*k+l 


k+1 


Xk+2+a 


Xk+2+a 


= ^/ 
k'  J 


Lk+1~  Lk+1 


Lk+l'(xk+2+a) 


zk(-d2>  " £T  / 


Lk+l-a-Xk+2 


k. 

z dz 


1 z 


k+1 


k!  k+1 


lLk+ra-xk+2 


1 ,,  .k+1 

1 (Lk+l  - a - xk+2) 


(k+1)!  v k+1 


,,  _1 .k+1 

k+1  = (k+1) ! ^ k+2~  Xk+2; 


where  in  the  second  line  we  put  z=Lk+1»  and  in  the  last  line  we 

used  (2.83b).  We  have  thus  proved  by  induction  that  (2.89)  holds  for 
all  k^l;  furthermore,  it  is  seen  that  for  k=0  (2.89)  gives  V^= 1,  in  agree- 
ment with  the  definition  in  (2.86a).  Therefore,  (2.89)  gives  for  all 

values  of  k as  defined  in  (2.86a)  and  (2.86b). 

Inserting  (2.89)  into  (2.87),  and  invoking  the  definitions  of 

L and  x in  (2.82)  and  (2.79),  we  find 
n+1  n+1 

|a|  = VN  = <LN+r  XN+1)N/N!  = (L  - Na  - | + |)N/n! 
so 

|f2|  = (L  - Na)N/N!  (2.90) 
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Inserting  (2.89)  into  (2.88)  we  find 

\-i  <v  *k)k-1/(k-1)« 

I’(:{klXk+l’",,xN)  =_v7“"  k/ 

k ak+r  xk+i)  /k! 


p(xklxk+i V 5 p(xk!xk+i)  = 


(Lk+l"  Xk+1) 


k (,'k'  xk) 


where  we  have  observed  that  the  density  function  for  conditioned  on 
^ •••>  xj^  is  in  fact  independent  of  *^+2’  •••»  xn*  T^e  Physical 

reason  for  this  is  that  the  left  boundary  for  rod  k is  determined  solely 
by  the  position  of  rod  k+1,  and  is  independent  of  the  positions  of  all 
rods  to  the  left  of  rod  k+1.  It  is  of  course  understood  that  (2.91)  gives 
P(xklxk+1*  * * * ,xn)  on^y  ^or  xk  in  t*ie  interval  (x^-j+a,  L^) , as  prescribed 
by  (2.81);  the  density  function  vanishes  identically  for  x^  outside  this 
interval.  The  formula  (2.91)  is  valid  for  all  k=l,2,...,N  provided  we 
keep  in  mind  that,  when  k=N , the  density  function  is  to  be  regarded  as 

P<V- 

Chi1-  next  step  is  to  calculate  the  one-*variable  distribution  functions 
F(xklxk+l)  corresP°ndin8  t0  the  one-variable  density  functions  P(x^|x^+^) 
in  (2.91).  Following  (2.23)  we  have 


F(xklW  E / p(xk,xk+i)dxk  = r_jL~ — 7k  f (Lk_  xk)k~ldxk 


xk+i+a 


(Lk+rxk+l)  x,  +a 
k+1 


With  a change  of  variable  z=L^-x^  the  integration  is  easily  accomplished. 
Using  (2.83b)  the  result  takes  the  form 


niniii  .,  . , i ■ 1 111  - m n u hhjiqiii m ^ hi n . Ill  Mil  M U 


F(lCklXk+l)  " 1_(l 


V Xk 


(2.92) 


k+r  xk+l; 


which,  as  before,  holds  for  x^  in  the  interval  (x^+^+a,  L^)  and  for  all 
k*l,2,...,N.  As  a check,  one  can  easily  verify  that  equa]  s 

zero  at  the  lower  limit  of  x^  [cf.  (2.83b)]  and  equals  unity  at  the  upper 


limit  of  x.  . 
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We  are  now  in  a position  to  apply  the  generalized  inversion  method 
described  in  connection  with  Eqs.  (2.29).  Thus,  we  pick  N random  numbers 
rl,r2,***,rN  ^rom  t*le  uniform  distribution  in  the  unit  interval,  and  we 
solve  the  equations 


rk  = F(xklx...1) 


k+r 


(2.93) 


for  x^  in  the  order  k-N,N-l , . . . ,2 ,1  as  dictated  by  our  conditioning  pro- 


cedure. Substituting  (2.92)  into  (2.93),  and  recognizing  that  1-r^  can 


be  replaced  by  r^  (since  both  are  uniformly  distributed  random  numbers 
in  the  unit  interval)  we  have 


V xk 


VL,  , x 


= r, 


k+1  k+l> 


Solving  for  x^  gives  the  final  generating  formula : 


xk  * Lk  (Lk+l~  xk+l)rk 


k=N,N-l, ... ,1 


(2.94) 


Therefore,  the  procedure  for  generating  an  ordered,  non-overlapping 
but  otherwise  uniform  random  configuration  of  N rods  of  length  a inside 


L 
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the  x axis  interval  (0,L)  is  as  follows:  First  calculate  (and  store 

if  more  than  one  configuration  is  to  be  generated)  the  NH*2  constants 

L^,  L^,...,  in  accordance  with  (2.79)  and  (2.82)  or  (2.83). 

Then  draw  N random  numbers  r,  , r «,  . ..,  rXT  from  the  uniform  distribution 

1 L N 

in  the  unit  interval,  and  compute  x^»  the  location  of  the  center  of  rod  k, 
from  the  formula  in  (2.94)  for  the  successive  k values  N,N-1, . . . ,2 ,1. 
Notice  that  (2.94)  is  to  be  applied  in  order  of  descend ing  k,  because 
the  formula  for  x^  requires  a value  for  Essentially,  (2.94)  just 

"deals'*  the  rods  out  on  the  interval  (0,L)  from  left  to  right,  and  our 
theory  assures  us  that  the  resultant  configuration  is  acceptable  without 
any  further  checking. 

The  rejection  technique  described  in  connection  with  Eqs.  (2.76) 
evidently  generates  random  points  uniformly  inside  the  N-dimensional 
box 


E - {(x 


] 


f * xk 


« L 


(2.95) 


and  then  rejects  those  points  which  do  not  also  lie  inside  ft  (which  is 
clearly  a subregion  of  Z) . The  efficiency  of  this  method  is 


E 


Ini 

izi 


q - Na)N/N! 
(L  - a)N 


or 


E = 


L (l  - Na\ 

N! V T,~a  / 


N 


(2.96) 


The  factor  1/N!  in  (2,96)  simply  reflects  the  fact  that  the  x^-values 

generated  by  this  method  are  generally  not  ordered  according  to  x >x  >...>x  ; 

i l N 
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hence,  the  second  factor  in  (2.96)  accounts  for  the  no-overlap  ac- 
ceptance ratio,  which  we  have  anticipated  in  (2.76c).  If  we  had  a=0, 
so  that  we  would  be  generating  N points  randomly  and  uniformly  inside 
(0,L),  then  overlap  would  clearly  not  be  a problem;  however,  if  it  is 
important  to  have  the  points  ordered , then  the  inversion  method  would 
still  be  preferable  for  large  N in  that  it  evidently  accomplishes  this 
ordering  automatically. 

The  generating  method  developed  here  can  be  easily  extended  to  gen- 
erate a random,  uniform  distribution  of  equal-length,  non-overlapping 
rods  (or  more  precisely  "arcs”)  around  the  boundary  of  a circle.  Again 
denoting  the  length  of  each  rod  by  a,  it  will  be  seen  from  Fig.  6 that 
the  problem  of  N rods  on  a line  of  length  L is  equivalent  to  the  problem 
of  N+l  rods  on  a circle  of  circumference  C=L+a  or  radius 


R = (L+a)  /2tt 


(2.97) 


Essentially,  the  edges  of  the  first  rod  laid  down  (rod  N+l)  form  the  bound- 
aries of  the  line  segment  0<x<L,  which  we  imagine  to  be  wrapped  around 
the  circle  as  shown  in  Fig.  6.  Thus,  letting  the  angle  0^  locate  the  center 
of  rod  k relative  to  any  chosen  axis,  we  first  locate  rod  N+l  by  putting 


°N+1  2,TrN+l 


(2.98a) 


where  r is  a random  number  from  the  uniform  distribution  in  the  unit 
N+l 

interval.  Then  letting  x measure  the  circumferential  length  from  the 
leading  edge  of  rod  N+l  (x=0)  to  its  trailing  edge  (x-C-a*L) , we  proceed 
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to  distribute  the  remaining  N rods  in  0<x<L  as  before.  Thus,  the 
angular  location  of  rod  k is  [see  Fig.  6] 

\ = 0N+1  + (a/2  + V/R>  (2.98b) 

where  is  generated  according  to  (2.94). 

As  a final  comment,  and  in  anticipation  of  the  development  to  be 
presented  in  the  next  chapter,  we  might  point  out  that  the  generating 
algorithm  developed  in  this  section  has  potential  applicability  in  the 
calculation  of  the  thermodynamic  properties  of  a "one-dimensional  gas 
of  hard-core  rods".  This  is  the  one-dimensional  analogue  of  a (non-ideal) 
gas  composed  of  spherically  symmetric  molecules,  which  are  assumed  to  have 
the  property  that  an  infinite  repulsive  force  develops  between  any  two 
molecules  when  the  distance  between  their  centers  becomes  equal  to  some 
fixed  value  a>0;  a is  called  the  "hard  core  diameter"  of  the  molecules. 
Suppose  a one-dimensional  gas  of  N rods  with  hard-core  diameter  a is 
enclosed  in  the  "volume"  (Kx^L,  and  is  allowed  to  come  to  thermodynamic 
equilibrium  with  its  surroundings  at  some  absolute  temperature  T.  The 
theory  of  statistical  mechanics  then  tells  us  that  the  equilibrium  value 
of  any  dynamical  quantity  f which  depends  only  on  the  positions  of  the 
rods  may  be  calculated  as 

L f (x)exp[-U(£)/kT]dx 

<f>  * (2.99) 

^ J^exp [ -U (x) / kT ] dx 

Here,  x»(x^ jX^, . . . ,x^)  denotes  a point  in  the  allowable  configuration 
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rv"v- 


space  of  the  gas,  which  evidently  is  precisely  the  volume  ft  in  (2.77); 


f($)  is  the  value  of  the  dynamical  quantity  f for  the  configuration  x; 


U(x)  is  the  total  potential  energy  of  the  rods  for  the  configuration  x; 


and  k is  Boltzmann1 s constant.  Analytical  evaluations  of  the  integrals 


in  (2.99)  have  been  successfully  carried  out  for  certain  special  forms  of 


f(x)  and  U(x).  However,  as  we  shall  see  in  the  next  chapter,  the  avail- 


ability of  an  efficient  algorithm  for  generating  points  xi  randomly  and 


uniformly  inside  ft  opens  up  the  possibility  of  numerically  estimating 


these  integrals  by  Monte  Carlo  methods  for  rather  general  forms  of  f(x) 


and  U(x).  A drawback  of  the  Monte  Carlo  approach,  as  compared  to  a purely 


analytical  approach,  is  that  the  dependence  of  (f)eq  on  such  external 


parameters  as  T,  N and  L can  be  inferred  only  by  making  separate  calcula- 


tions at  specified  values  of  these  parameters;  in  addition,  limitations 


on  computation  time  will  clearly  place  an  upper  limit  on  the  size  of  N. 


Nevertheless,  in  cases  where  an  analytical  calculation  simply  cannot  be 


effected,  a series  of  Monte  Carlo  calculations,  however  restricted,  may 


yield  useful  and  otherwise  unobtainable  information. 


Chapter  3 


MONTE  CARLO  ESTIMATION  OF  INTEGRALS 


3-1.  Averages  and  Integrals 

Let  f (x)  denote  any  bounded  function  defined  in  the  n-dimensional 
space  of  the  variable  x=  (x^x®, . . . , and  let  P(x)  denote  a probability 

density  function  defined  in  this  same  space.  We  define  "the  average  of  f (x) 
taken  over  the  set  of  random  points  {x^}  distributed  according  to  the 

.L 

density  function  P(x)"  by' 


<f:P> 


_ lim 


1 

N 


N 


I £(*i> 


(3.1) 


whenever  the  limit  exists.  It  is  important  to  recognize  that  the  average 
of  f (x)  depends  not  only  on  the  function  f but  also  on  the  density  function  P 
which  defines  the  set  of  random  points  {x\}  over  which  the  average  is  taken. 
Thus,  if  we  denoted  the  average  of  f by  simply  (f^  our  notation  would  be 
ambiguous  and  incomplete. 

In  the  limit  of  sufficiently  large  N we  may  expect  that  P(x)dx 
accurately  represents  the  fraction  of  the  random  points  x^  ,x^ , . . . ,3?^  being 
summed  over  in  (3.1)  which  falls  inside  the  infinitesimal  region  dx  centered 


Throughout  this  paper,  the  colon  within  a mathematical  expression  can 
be  verbalized  as  "with  respect  to".  Thus,  for  example,  (f :P^  is  read 
"the  average  of  f (x)  with  respect  to  the  density  function  P(x)". 
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at  the  point  x.  Therefore,  in  the  limit  of  large  N the  contribution  to 
the  sum  on  the  right  side  of  (3.1)  due  to  prints  which  lie  inside  dx 
at  3?  is  just  f (x)*NP(x)dx.  The  sum  in  (3.1)  can  thus  be  calculated  by 
summing  (integrating)  this  quantity  over  all  such  infinitesimal  regions 
in  x-space: 

N 

(lira  tH*):  l f(x  ) = ff(x)NPCOdx  (3.2) 

i-1 

Dividing  through  by  N and  comparing  with  (3.1)  gives  the  important  result 


/f«)P«)d*  = <f:P>  (3.3) 


This  result  says  that  the  integral  of  the  quantity  f(x)P(x)  over  all 
x-space  is  equal  to  the  average  of  f (x)  taken  with  respect  to  the  set  of 
random  points  {xj  distributed  according  to  the  density  function  P(x). 

It  forms  the  basis  for  the  "Monte  Carl'-'  method"  of  evaluating  a precisely 
defined  definite  integral  as  an  average  taken  over  a suitable  set  of  random 
points.  In  particular,  suppose  the  density  function  P(x)  defines  the  set 
of  random  points  {x^}  distributed  uniformly  over  some  given  finite  region 
ft  of  x-space;  i.e.,  suppose  P($)  is  given  by 
[ |ft|  \ if  xcft 


0 , if  x£ft 


(3.4) 


where  |ft|  is  the  volume  of  ft.  Then  we  have  from  (3.3)  and  (3.4) 


(f:P rS  = Jf  (x)P^(x)dx  = J f (x)  j ft | ^"dx 
' ' 00  ft 
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whence 


/ f (x)  dx  = |fl|<f:Pf2')  (3.5) 

Q 

Thus,  the  integral  of  f(x)  over  the  region  ft  is  equal  to  the  volume  of 
ft  times  the  average  of  f (x)  with  respect  to  the  set  of  random  points 
{x^}  distributed  uniformly  over  ft.  This  result  is  evidently  a special 
case  of  (3.3) . 

If  we  could  actually  calculate  the  average  (f  :P^  as  defined  in  (3.1), 
then  (3.3)  would  provide  us  with  our  Monte  Carlo  method  for  calculating 
definite  integrals,  and  our  story  would  be  finished.  In  practice,  of 
course,  all  we  can  really  do  is  calculate  quantities  such  as 

1 N 

<f:P)N  = i l f(*,)  (3.6) 

i=l 

for  some  finite  N.  Since  by  definition  '^f  (f  :P)as  N-*»,  then  we  may 
expect  that  if  N is  taken  "fairly  large" 

<f:P>N  - <f:P>,  (3.7) 

so  that  the  integral  on  the  left  of  (3.3) is  given  approximately  by  the 
N-term  average  on  the  right  of  (3.6).  But  how  good  is  this  approximation? 
Clearly  the  approximation  is  worthless  from  a practical  point  of  view 
unless  we  can  give  a meaningful  estimate  of  its  associated  uncertainty. 
This  important  matter  is  the  ;wpic  of  the  next  section. 
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3-2 . Fluctuations  and  Uncertainties 


The  short  derivation  of  (3.3)  given  in  the  preceding  section  is 
useful  for  conveying  an  intuitive  feeling  for  the  central  idea  of  the 
Monte  Carlo  method,  but  it  is  of  little  practical  use  in  view  of  the 
fact  that  we  can  calculate  only  ^f :P^  in  (3.6)  and  not  (f:P^  in  (3.1). 

In  fact,  the  goodness  of  the  approximation  in  (3.7)  can  be  obtained  only 
by  resorting  to  a famous  but  dif f icult-to-prove  result  in  probability 
theory  called  the  Central  Limit  Theorem.  In  order  to  state  this  theorem 
and  to  see  how  it  applies  to  our  problem  here,  it  is  first  necessary  to 
introduce  a few  concepts  and  definitions  from  probability  theory. 

Suppose  we  have  a set  of  random  real  numbers  {y^}  distributed  according 
to  some  density  function  ?(y) . If  g(y)  is  any  function  of  y,  then,  as  we 
have  seen  in  (3.1)-(3.3),  the  average  of  g(y)  over  the  set  (y^}  is  given  by 

(s;^)  = I I g(yJ  = Jg(y)P(y)dy  (3.8) 

j = 2.  ^ —00 

In  particular,  we  define  the  mean  m and  variance  a2  of  the  set  {y^}  to 
be  the  averages  over  {y^}  of  the  respective  functions  g(y)=y  and  g(y)**(y^m) ° , 
assuming  as  we  do  that  these  averages  exist: 

» 5 <y-^>  = £2  1 1 yj  - <3-9) 

j=l  J ~°° 

a2  l ((y-m)2:p)  = ^ [ (y.-  m) 2 = /(y-m) 2P(y)dy  (3.10a) 

K K j=l  3 

By  expanding  the  square  in  (3.10a)  it  is  easy  to  show  that  the  variance 
can  also  be  written  as  the  difference  between  the  average  of  the  square 
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of  y and  the  square  of  the  average  of  y: 

00 

o2  = Jy2P(v)dy  - m2  » {y2:^  - ^y:?^)2  (3.10b) 

-oo 

The  square  root  of  the  variance,  namely  a,  is  called  the  roo t -mean-square 
(or  rms  or  standard)  deviation  of  the  set  {y,}.  If  the  graph  of  ?(y)-versus-y 

--  " i 

consists  mainly  of  a single  hump,  as  is  shown  in  Fig.  7,  then  (3.9)  and  (3.10a) 
imply  that  m and  a characterize  respectively  the  "center”  and  "width"  of  the 
graph;  roughly  speaking,  we  may  "reasonably  expect"  that  a randomly  drawn 
number  from  the  set  {y.}  will  lie  somewhere  between  m-aa  and  m+ota,  where  a is 
of  order  unity. 


Now,  suppose  we  construct  from  the  set  { y ^ } another  set  of  random  numbers 
according  to  the  rule 


, (n)  _ 


yi  = (yx  + ?2  + * * * + yN)/N 


(y<*> 


(3.11) 


That  is,  each  element  of  the  set  {y^  is  the  simple  average  of  N randomly 
drawn  elements  of  the  set  {y^}.  We  now  ask,  what  can  be  said  concerning  the 
distribution  of  the  set  {y<Nh 


In  Appendix  C [cf.  (Q14)  -(C>16)  ] we  derive  a partial  answer  to  this 
question:  the  mean  and  rms  deviation  of  the  set  {y[*i  are  equal  to  m and 
a//N  respectively,  where  m and  a are  as  before  the  mean  and  rms  deviation 
of  the  original  set  {y^}.  However,  for  Monte  Carlo  purposes  it  is  necessary 
to  have  a somewhat  more  detailed  knowledge  of  the  distribution  of  the  set 
[y^\;  specifically,  we  would  like  to  be  able  to  translate  the  rms  deviation 
a// N of  this  set  into  a specification  of  numerical  confidence  limits.  For 
this  we  must  resort  to  the  Central  Limit  Theorem  [see  p.  244  of  Rtf.  8], 
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which  says  in  effect  that  iui  the  limit  of  large  N the  set  ly^  7 becomes 
a Gaussian  (or  "normal")  distribution,  with  (as  just  asserted)  mean  m 
and  rms  deviation  a//N.  More  precisely,  the  Central  Limit  Theorem  asserts 
that 


lim 

N-*» 


Prob 


0.683 

0.955 

0.997 


for  a=l 
for  a-2 
for  a=3 


(3.12) 


Thus,  for  example,  provided  N is_  sufficiently  large  the  probability  that 
the  average  of  N randomly  drawn  elements  from  the  set  {y^l  will  lie  between 
m-2a//N  and  m+2a//N  is  0.955,  where  m and  a are  respectively  the  mean  and 
rms  deviation  cf  {y^} . The  power  of  this  result  is  that  it  holds  irrespective 

A* 

of  the  form  of  the  density  function  P(y)  of  {y^}.  However,  the  Central 
Limit  Theorem  also  has  a notable  limitation:  it  does  not  specify  how  large 

N must  be  in  order  that  the  lim  symbol  in  (3.12)  can  be  ignored  for  practical 
purposes.  Presumably,  the  rate  of  approach  to  the  limit  in  (3.12)  will 
depend  in  some  complicated  way  upon  specific  form  of  the  density  function 

/y 

P (y) , but  the  Central  Limit  Theorem  gives  us  no  information  in  this  regard. 

Now  let  us  see  how  the  foregoing  results  allow  us  to  quantitatively 
estimate  the  uncertainty  associated  with  the  crucial  approximation  (3.7). 
Essentially  we  have  taken  a set  of  random  points  {x^}  distributed  in 
n-dimensional  space  according  to  the  density  function  P(x),  and  we  have 
constructed  a set  of  random  numbers  {y^}  by  subjecting  {x\}  to  the 
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transformation 


y * f (5?)  (3.13) 

This  set  of  random  numbers  {y^}  will  have  some  density  function  P(y),  the 
form  of  which  is  completely  determined  by  the  two  functions  P(x)  and  f(x). 

In  practice,  it  is  virtually  impossible  to  calculate  the  shape  of  T*(y) 
analytically  from  P(3f)  and  f(x);  just  to  illustrate  what  would  be  involved 
in  such  a calculation,  we  sketch  in  Fig.  8c  the  density  function  P(y) 
which  would  result  in  the  case  where  x is  one-dimensional  and  the  functions 

j. 

P(x)  and  f (x)  are  those  sketched  in  Figs.  8a  and  8b  respectively1.  But  in 

"f* 

Given  a set  of  random  points  {x^}  distributed  according  to  the  density  function 
P(x)  in  Fig.  8a,  then  the  transformation  y - f(x)  in  Fig.  8b  produces  a set  of 
random  numbers  {y^}  whose  density  function  P(y)  is  obtained  from  the  general 
rule  P(y)dy=P(x)dx — i.e.,  P(y)srP(x)/ | f ’ (x) j where  x=f  1 (y)  with  proper  account 
being  taken  of  the  multi-valuedness  of  the  inverse  function  f J (y) . Thus,  for 


Figs. 

8a  and  8b  we 

have 

P(y)  = ?! 

(y)  + ?2(v) 

where 

Pi(y)  = • 

P(x)/|f'(x)|, 

0 

for 

for 

x<£ 

x>£ 

and 

II 

1 0 
P(x)/|f'(x)|, 

for 

for 

x<£ 

X>C 

The  resultant  P(y)  is  sketched  in  Fig.  8c.  The  jump  discontinuity  in  P(y) 
at  y-B  is  due  to  the  fact  that  the  interval  B<y<C  is  fully  populated  by  y^'s 
coming,  not  only  from  x^*s  in  the  Interval  a<x<£,  but  also  from  x^'s  in  the 
interval  £<x<£f.  The  infinite  peak  in  P(y)  at  y=*C  is  due  to  the  fact  that 
f,(^)*0.  The  hump  in  P(y)  near  y*A  is  a reflection  of  the  hump  in  P(x)  near 
x— b • 
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rIGURE  8.  Illustrating  how  the  density  function  P(y)  of  the  values 
y=f(x)  is  determined  by  the  density  function  P(x)  and  the 
function  f(x),  in  a simple  case  in  which  x is  one-dimensional 
(See  footnote  on  p.  83.) 

8< 
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an>  rase,  it  is  clear  that  the  mean  m of  the  set  {y^} — l.e.,  the  "center” 
of  the  curve  of  P(y)-versus-y — coincides  with  the  average  of  f(x)  with 
respect  to  the  set  {x^} : 

m 5 <y=p>  5 k-  I J/j  = k-  I ^<9  E <f:P>  (3-14a) 

Similarly,  the  rms  deviation  0 of  the  set  {y^} — l.e.,  the  "width"  of  the 

«v 

curve  of  P(y)-Vwrsus-y — is  given  by 

a =>/<y2:P)'<y:P)2’=  V<f2:P>  - (f'-P)2  (3.14b) 

(N) 

And  finally,  the  average  y^  of  N random  elements  of  {y^}  is  seen  to  coincide 
with  the  calculated  quantity  (frP^: 

= i jy(3fj>  e <f;p>N  (3-wc) 

In  view  of  Eqs.  (3.14)  we  thus  see  that  the  Central  Limit  Theorem  in  (3.12) 
implies  that 

lto  Prob||<f:P)  - (f:P)|  $ a ^jrl  - /ex p(-t2/2)dt  (3.15) 

N-h»  1 1 -a 

where  a = 7<f2:P>  - (f:P)2  (3.16) 

In  essence,  then,  we  may  say  that  for  N sufficiently  large  the  approxima- 
tion in  (3.7)  has  a Gaussian  rms  uncertainty  of  a// N,  where  a is  given  by 
(3.16).  Of  course,  since  we  do  not  know  (f:P^  beforehand,  much  less  ^f2:P^, 
then  it  is  clear  that  we  do  not  know  0 beforehand  either.  However,  if  N 
is  sufficiently  large  so  that  tne  approximation  ( f :P)N  - (f:P)  is  good,  then 
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then  we  may  also  put  (f2:P^N  - (f2:P),  and  thereby  take 


0 “ - <f:P>J 


(3-17) 


as  a sufficiently  accurate  approximation  to  a for  the  purposes  of  (3.15). 

In  summary,  then,  we  have  the  following  ’’Golden  Rule  of  Monte  Carlo": 

If  x i , X2,...,x^  are  N random  points  distributed  according  to  the  probability 
density  function  P(x),  and  if  for  a given  function  f(x)  we  put 

N , N 


<f:P>N  5 5 and  <f2sP>N  5 5 j/tfp 

then  provided  N is  sufficiently  large  we  have 


(3.18) 


Jf (x)P(x)dx  - (f:P)N  ± 


(3.19) 


The  ± quantity  in  (3.19)  is  understood  to  correspond  to  a "65%  confidence 
interval” » and  should  thus  be  typical  of  the  average  spread  between  several 
independent  evaluations  of  (f  :P)N«  If  we  double  the  ± uncertainty  in  (3.19) 
we  obtain  a "95%  confidence  interval",  which  is  perhaps  a more  suitable 
uncertainty  to  use  when  asserting  a value  for  the  integral;  and  if  we 
treble  the  ± uncertainty,  we  obtain  an  even  more  conservative  "99%  confidence 
interval”.  Note  that  in  the  limit  N-*»  (3.19)  indeed  goes  over  into  (3.3). 


3-3.  Operational  Procedures 

In  what  follows  we  shall  assume  that  we  are  presented  with  the  problem 
of  integrating  a bounded  function  f (x)  over  a finite  region  ft  in  n-dimensional 
x-space.  Now, 
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or 


ff(*)dx  = |n|/£(0|opa*-  |K|Jf(5)Pn(*)d2 

n n n 

Jf(x)d5  = |n|/f(x)Pfi(*)dx 

n 00 


(3.20) 


where  |S2|  is  the  volume  of  0 and  PQ(»)  is  the  density  function  (3.4)  defining 
the  uniform  distribution  of  random  points  inside  SI.  Substituting  (3.19) 
into  the  right  side  of  (3.20)  then  yields  the  result 


Jf (x)dx  - |n| 

a 


<f:PS2>N  1 


y<f2:pKV<f:p^ 


(3.21) 


The  procedure  for  numerically  estimating  the  integral  of  a bounded 
function  f (x)  over  a finite  region  (2  of  5-space  therefore  consists  of  the 
following  steps,  which  we  state  in  the  "iterative  language"  of  linear 
computer  programming: 

1°  Initialize  Si=0,  S2=t>,  i=0» 

2°  Generate  a random  point  t from  the  uniform  distribution  inside  S2  [i.e., 
according  to  the  density  function  Pfi(x)  in  (3.4)]. 

3“  Evaluate  y-f(x)  for  the  generated  point  5. 

4°  Put  Si»Si+y>  s2=s2+y2,  i"i+1 • 

5-  „ vh.r,  » 1.  «.  !.«•  •»  W 2-1 

if  i=N  go  to  step  6°. 

6,  put  <f:pnVSl/N  and  <f2:Pfl>N*S2/N,  and  obtain  the  Monte  Carlo  estimate 
of  the  integral  from  (3.21).  Remember  that  the  ± uncertainty  in 
(3.21)  represents  a 65%  confidence  interval;  in  a final  quotation  it 
is  usually  best  ,o  double  this  uncertainty  to  obtain  a 95%  confidence 

interval . 
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I’he  foregoing  steps  are  schematized  in  the  flow  diagram  of  Fig.  9. 


It  will  be  observed  that  the  above  procedure  always  yields  numerical 
results  for  the  estimate  and  its  associated  uncertainty,  even  if  N has  not 
been  chosen  large  enough  to  satisfy  the  limit  requirement  in  (3.15)  and  the 
0 approximation  in  (3.17).  It  is  important  to  realize  that,  unless  N is 
large  enough  for  these  two  approximations  to  be  valid,  then  one  may  not 
assert  that  the  calculated  estimate  is  indeed  accurate  to  within  th.  cal- 
culated uncertainty  with  the  numerical  confidence  limits  prescribed  in 
(3.12).  Unfortunately,  there  is  no  way  of  telling  from  the  results  of  a 
single  Monte  Carlo  calculation  whether  N has  been  taken  large  enough  to 
produce  reliable  results.  One  fairly  convenient  way  of  checking  this  im- 
portant point  is  to  proceed  as  follows:  Choose  a value  of  N such  that  one 

can  afford  to  perform  the  Monte  Carlo  calculation  four  times,  each  time 
using  a different  set  of  N random  points.  If  the  uncertainties  obtained 
in  these  four  runs  agree  to  within  better  than  10%  (i.e.,  to  within  at  least 
the  first  significant  digit),  and  if  further  the  four  estimates  each  differ 
from  their  average  by  no  more  than  roughly  twice  the  uncertainty,  then  one 
usually  can  accept  these  results  subject  to  the  confidence  limits  pre- 
scribed in  (3.12).  One  may  then  quote  as  the  estimate  of  the  integral 
the  average  of  the  estimates  found  in  the  four  separate  runs.  Since  this 
average  evidently  corresponds  to  a total  of  4N  points,  then  its  one-standard 
deviation  uncertainty  will  be  exactly  half  that  of  the  individual  runs; 
thus,  the  one-standard  deviation  uncertainty  found  in  the  four  individual 
runs  may  be  quoted  as  the  two-standard  deviation  uncertainty  (giving  95% 
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Generate  a random  point  x from  a 
uniform  distribution  inside  ft. 


FIGURE  9.  Flow  diagram  of  the  basic  steps  involved  in  the 
Monte  Carlo  evaluation  of  an  integral. 
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confidence  limits)  for  the  average  of  the  estimates  found  in  the  individual 

t 

runs. 

If,  on  the  other  hand,  the  results  of  the  four  repeated  Monte  Carlo  runs 
are  not  mutually  compatible  in  the  sense  described  above,  then  one  should 
tentatively  assume  that  N is  not  large  enough  to  produce  reliable  results. 

If  increasing  N fails  to  improve  the  situation,  then  one  should  suspect  the 
existence  of  either  (i_)  a programming  error,  or  (ii)  a singularity  in  the 
integrand.  The  latter  circumstance  is  of  course  intolerable;  for,  if  f(x) 
approaches  infinity  as  x approaches  some  point  xc  in  ft  (but  in  such  a way 
that  /^f(5?)dx  is  nevertheless  finite),  then  the  Monte  Carlo  estimate  of  the 
integral  can  be  arbitrarily  large,  depending  solely  upon  how  close  one  happens 
to  come  to  while  picking  points  at  random  in  ft. 

Our  main  point  here,  though,  is  that  the  entire  Monte  Carlo  method  is 
predicated  on  the  assumption  that  the  number  of  random  points  sampled  is 
"sufficiently  large";  therefore,  one  should  never  accept  the  numerical  re- 
sults of  the  computational  procedure  (3.22) [or  Fig.  9]  without  being  reason- 
ably confident  that  N is  indeed  "sufficiently  large".  One  way  of  checking 
this  point,  as  suggested  above,  is  to  require  consistency  among  the  results 
of  several  repeated  calculations.  In  the  sequel,  we  shall  always  assume  that 
this  or  some  equivalent  check  has  been  performed. 


An  additional  advantage  of  splitting  a 4N-point  Monte  Carlo  run  into  four 
separate  N-point  runs  is  that  it  considerably  reduces  the  possible  loss 
which  might  result  from  a computer  malfunction  or  control  card  error. 
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A common  source  of  difficulty  in  executing  the  six -step  procedure  of 
(3.22)  is  generating  a random  point  from  a uniform  distribution  inside  ft 
(Step  2°).  If  ft  is  a box-like  region  then  the  calculation  of  |ft|  and  the 
generation  of  random  points  uniformly  inside  ft  is  very  easy  [cf.  (2.33]. 
However,  it  frequently  happens  that  ft  is  not  a box-like  region,  and  that 
we  are  not  able  to  calculate  |ft|;  this  will  usually  be  the  case  if  ft  is 
defined  by  a set  of  inequalities  involving  various  functions  of  the  compo- 
nents of  x which  are  not  "neatly  ordered"  as  in  (2.31a).  In  such  a case 
the  following  procedure  can  sometimes  be  used.  Find  some  box-like  region 
Z which  completely  encloses  ft,  and  define  a new  function  g(x)  in  £ which 
coincides  with  f inside  ft  but  which  vanishes  outside  ft: 


I (box-like)  encloses  ft 

✓-n  =|f(x),  if  xeft 
gW  ’I  0 , if 


(3.23) 


Clearly,  the  integral  of  g(x)  over  Z is  equal  to  the  integral  of  f (x) 
over  ft;  thus,  the  procedure  (3.22)  can  be  carried  out  with  f and  ft  re- 
placed by  g and  Z respectively,  and  (3.21)  becomes 


J f (2)d£  - 

ft 


Z 


(g:?z)  N 1 


y/<(g2:Pz) N " (g:Vl) N 


/S 


(3.24) 


Of  course,  we  should  not  expect  this  procedure  to  be  very  efficient  if 
|ft|«|Z|,  and  for  this  reason  we  should  always  take  Z to  be  the  smallest 
box  enclosing  ft. 

A frequently  useful  example  of  the  foregoing  procedure  is  the  calcula- 
tion of  the  volume  |ft|  itself.  Since 
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/i’dx  - |n| 


then  we  have  from  (3.23)  and (3.24) 

|0|  “ |S| 

, - I 1,  if  xeft 

where  g(*)  = J 0>  if  m 


+ *V:Pi>n  - <8:Pi>n' 


(3.25a) 

(3.25b) 


Now  suppose  that,  of  the  N points  generated  randomly  and  uniformly  inside  I, 
M of  them  are  found  to  fall  inside  ft.  It  follows  from  (3.25b)  that  the  sums 
Si  and  S2  computed  in  accordance  with  Step  4°  of  (3.22)  will  both  equal  M, 
so  that  in  Step  6°  we  will  find 


<g:Pz>N  = (g2:P;;)N  - M/N 


(3.26) 


Substituting  into  (3.25a)  yields  the  result 


(3.27) 


where,  again,  of  N uniformly  istributed  random  points  inside  E,  M are 
found  to  lie  inside  ft.  According  to  (.3.27),  |ft|  is  given  approximately 
by  (M/N) | Z | , just  as  we  expect.  But  (3.27)  also  provides  us  with  an 
estimate  of  the  uncertainty  in  this  approximation.  Evidently,  the  relative 
uncertainty  is  equal  to  1//N  times  the  square  root  of  the  ratio  of  the 
number  of  "miss  points"  N-M  to  the  number  of  "hit  points"  M;  therefore, 
as  mentioned  earlier,  the  uncertainty  will  be  large  if  M«N,  or  |ft|«|E|. 
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i. 

i: 


In  fact,  by  putting  M/N  - |ft|/|E|  in  the  relative  uncertainty  term  in 
(3.27),  it  is  easy  to  show  that  if  |ft|  is  n(>l)  orders  of  magnitude  smaller 
than  |E|,  then  N will  have  to  be  of  order  n+2  to  obtain  an  estimate  of 
|ft|  having  a 10%  relative  uncertainty,  and  of  order  n+4  to  obtain  an 
estimate  having  a 1%  relative  uncertainty. 

Returning  to  the  problem  of  Integrating  a general  function  f (x)  over 
a non-box  region  ft,  suppose  the  volume  |ft|  i£  known  exactly,  but  that  it 
is  nevertheless  not  practical  to  generate  uniformly  distributed  random 
points  $ inside  ft  by  the  direct  (inversion)  method.  In  such  a case  one 
could  of  course  still  use  the  box-method  described  in  connection  with 
(3.23)  and  (3.24);  alternatively,  one  could  use  (3.21)  in  conjunction 
with  the  rejection  method  for  generating  random  points  uniformly  inside 
ft.  Thus,  using  the  same  enclosing  box  E as  in  (3.23)  and  (3.24),  suppose 
that,  of  N random  points  generated  uniformly  inside  the  box  E,  M are  found 
to  fall  inside  ft.  Then  these  latter  M points  can  be  used  to  calculate  the 
quantities  and  (f2:Pj)M»  an<*  we  may  Put  in  accor<*ance  (3.21) 


/f (x)dx  - |n| 
ft 


<f:Pft>M  1 


/<f2=Pft>H  - <-ft>I 


(3.28) 


It  is  interesting  to  compare  the  approach  of  (3.28)  with  that  of  (3.24). 
Given  th»>  precise  values  of  both  volumes  |ft|  and  |E|,  it  should  be  obvious 
that  calculating  accoruing  to  (3.28)  involves  exactly  the  same  amount  of 
work  as  calculating  according  to  (3.24);  indeed,  it  is  clear  that  the 
sums  Si  and  S2  calculated  in  Step  4°  of  (3.22)  will  be  the  same  for  (3.28) 
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as  for  (3.24),  because  the  N-M  points  x^  not  used  in  these  sums  for 
(3.28)  contribute  zero  to  these  sums  for  (3.24).  In  terms  of  these  common 
sums  Si  and  S2,  (3.24)  and  (3.28)  can  be  written  respectively  as 


and 


/f(x)d2 

A 


/f (x)dx 

A 


(3.29a) 


(3.29b) 


Since  in  the  limit  N-+00,  we  expect  |A|/|£|  = M/N,  then  the  averages  or  central 
values  in  (3.29a)  and  (3.29b)  are  indeed  consistent.  But  as  M is  always  less 
than  N,  then  the  relative  uncertainty  in  (3.29b)  is  always  less  than  that 
in  (3.29a).  If  f (x)  is  approximately  constant  in  A we  will  have  Si=MS2,  so 
that  the  relative  uncertainty  in  (3.29a)  will  approximately  that  given 
in  (3.27)  whereas  the  relative  uncertainty  in  (3.29b)  will  be  approximately 
zero.  Thus,  if  |A|  is  known  exactly,  then  it  is  usually  a bit  more  efficient 
to  make  use  of  this  knowledge  and  proceed  via  (3.28)  rather  than  via  (3.24). 

If  the  volume  of  A is  very  much  smaller  than  the  volume  of  the  smallest 
box  E which  encloses  A,  then  clearly  neither  (3.28)  nor  (2.34)  will  be 
satisfactory.  In  such  a case  one  should  endeavor  to  calculate  |A|  analyt- 
ically, and  in  the  process  also  calculate  the  distribution  functions  F^1*), 
F(x<2y F(^vv..,rt  which  allow  one  to  use  the  generalized 
inversion  method  for  directly  generating  random  points  uniformly  inside  A 
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s 


?' 

?■ 


[cf.  (2.34)  and  (T. . 29] . Alternatively,  one  can  try  to  find  some  trans- 
formation of  variables  x+£' which  carries  the  region  ft  into  a more  suitably 
shaped  region  ft1.  Given  such  a transformation  (take  n*3  for  concreteness). 


x*  * x*(x,y,z) 
y1  * y* (x,y,z)  ► 

z*  * z*(x,y,z) 


(3.30a) 


one  could  then  express  the  given  integral  as  an  integral  ove?  ft1  according 
to  [cf.  (B.9) ] 


JJJf (x,y,z)dxdydz 

ft 


JJJf (x,y,z) 

ft* 


3(x,y,z) 

3(x,,y,,zt) 


dx*dy *dz  * 


= JJJh(xf  ,y*  jZ^dx’dy’dz * (3.30b) 

ft* 


Here,  the  last  step  is  carried  out  after  solving  (3.30a)  for  x,  y and  z in 
terms  of  xf,  yf  and  zf.  It  is  interesting  to  note  that  the  inversion  method 
for  generating  random  points  uniformly  inside  ft,  in  which  one  calculates 
the  one-dimensional  distribution  functions  F(x),  F(y|x),  F(z|x,y)  for  the 
uniform  density  function  Pn(x,y,z)  and  puts 


ri  * F(x) 
r2  * F(y|x) 
r 3 * F(z|x,y) 


(3.31a) 


can  also  be  regarded  as  a transformation  of  variables  from  xyz-space  to 
rir2r 3-space.  This  transformation  has  the  special  properties  that  (i)  the 
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region  ft  in  xyz-space  is  transformed  into  the  unit  cube  in  rir2r 3-space, 
and  (ii)  the  Jacobian  of  the  transformation  is  simply  [cf.  (2.30)  and 
(2.32)3 


3(x.y.z)  1 , 1 |0| 

3(r  1 »f2  ,rs)  Pjj(x,y,z)  T^f1  ' 


(3.31b) 


There  are,  of  course,  many  transformations  which  will  carry  ft  into  a unit 
cube,  but  (3.31a)  is  the  only  one  of  these  that  has  a constant  Jacobian. 
Another  transformation  to  the  unit  cube  which  can  be  applied  whenever  ft 
is  specified  as  in  (2.31a)  is  the  simple  "linear  stretching"  transformation 


ri  = [x  - a 1 ] / [b 1 - ai] 

r2  - [y  - a2 (x) ]/ [b2 (x)  - aa(x)] 

r3  = [z  - a3(x,y)]/[b3(x,y)  - a3(x,y)j  , 


(3.32a) 


the  Jacobian  of  which  is  easily  calculated  as 


9(x,y,z) 
3(r 1 ,r2,r3) 


[b 1 - ai][b2(x)  - a2 (x) ] [b3 (x ,y)  - a3(x,y)]  (3.32b) 


Whether  (3.31a)  or  (3.32a)  is  the  better  transformation  to  the  unit  cube 
depends  on  the  form  of  the  integrand,  as  we  shall  see  more  cl  arly  in  our 
discussion  cf  "importance  sampling"  in  Secs.  4-5  and  4-6. 

If,  in  the  case  where  the  volume  of  ft  is  extremely  small  compared  to 
the  volume  of  the  smallest  box  enclosing  ft,  all  efforts  at  inversion 
generating  and  variables  transformation  fail,  then  one  may  simply  be  forced 
to  conclude  that  the  conventional  Monte  Carlo  method  is  not  applicable. 
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integrating  region  (or  both)  evidently  presents  a special  problem.  Usually 
the  best  procedure  is  to  try  to  find  a transformation  of  variables  [cf . (3.30)] 
which  is  such  that  the  transformed  integrand  (i.e.,  the  old  integrand  times 
the  Jacobian  of  the  transformation)  is  bounded,  and  the  transformed  inte- 
grating region  is  finite.  Then  the  methods  outlined  above  may  be  applied. 

If  the  given  integral  truly  exists,  then  many  such  transformations  to  a 
bounded  integrand  and  a finite  integrating  region  exist,  but  whether  or  not 
one  of  these  can  actually  be  found  is  another  matter.  In  certain  cases 
where  the  only  difficulty  is  an  infinite  integrating  region,  we  can  often 
just  use  the  more  general  Monte  Carlo  formula  (3.19).  For  example,  suppose 
we  have  to  evaluate  the  one-dimensional  integral 
00 

I = Jf(x)dx  (3.33a) 

o 

where  f(x)  can  be  written  in  the  form 

f(x)  3 h(x)xnexp(-ax2)  (3.33b) 

with  n a non-negative  integer,  a>9,  and  h(x)  bounded  for  0<x<°°.  Using 
the  density  function  P(x;n,a)  defined  in  (2.51),  we  may  evidently  write 

i 

I as 

00 

I » A”1  (n,a) Jh(x)P(x;n,a)dx  (3.34) 

v C 

Since,  as  discussed  in  detail  in  Sec.  2-9,  we  know  how  to  generate  random 
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numbers  x^  in  0<«<°°  according  to  the  density  function  P(x;n,a),  then  we 
may  evidently  proceed  according  to  (3.19),  wherein  the  volume  of  the  inte- 
grating region  never  enters  explicitly.  Note  that  if  we  use  the  straight 
inversion  method  to  generate  random  points  x^  according  to  P(x;n,a) — i.e., 
if  we  proceed  by  actually  inverting 

r * F(x;n,a)  (3.35a) 

for  a given  r^r^  where  F(x;n,a)  is  the  distribution  function  in  (2.52) — 
then  we  are  in  effect  making  a change  of  variable  x+r:  For,  by  (3.35a) 

dr  = F'(x;n,a)dx  * P(x:n,a)dx  (3.35b) 

so  that  (3.34)  can  be  transformed  to 

1 

I - A'1  (n,a)/h(x)dr  (3.36) 

o 

In  (3.36)  x is  now  understood  to  be  the  function  of  r obtained  by  inverting 
(3.35a).  As  discussed  in  connection  with  (2.52),  this  inversion  must  be 
done  numerically  in  all  cases  except  for  n*l . 

3-4.  Combining  Results  of  Monte  Carlo  Calculations 

It  frequently  happens  that  we  are  interested  in  the  sum  of  two  integrals 
each  of  which  has  been  computed  independently  in  separate  Monte  Carlo  calcula 
tions.  Consider,  for  example,  the  two  integrals 

*k  =j/fk(3fk)d5V  k - 1 and  2 (3.37) 

* k 
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where  we  allow  even  for  the.  possibility  that  fti  and  ^2  may  be  of  different 
dimensionalities.  If  two  independent  Monte  Carlo  calculations  have  yielded 
the  results  [cf.  (3.21)] 

1^  - 1^  ± A^,  k * 1 and  2 (3.38) 

then,  as  we  shall  show  below,  we  may  assert  that 

Ii  + I2  = (Ii  + I2)  ± Va?  + Al'  (3.39) 


Note  in  particular  that,  although  the  estimate  of  the  sum  is  the  sum  of  the 
estimates,  the  uncertainty  in  the  sum  is  always  somewhat  less  than  the  sum  of 
the  uncertainties. 


To  show  that  (3.39)  is  true  we  assume  that,  for  k*l  and  2,  and 
have  been  calculated  in  the  usual  Monte  Carlo  way  [cf.  (3.21)]  using  a 
’’sufficiently  large"  number  of  uniformly  distributed  random  points  in 
Then  7^  is  in  fact  a particular  element  of  a set  of  random  numbers  {i^ 
which  has  a mean  1^  and  a variance  A^.  Now  clearly  Ii+  1 2 is  a particular 
element  of  a set  of  random  numbers  {Ipj+  1-2, ±*  is  formed  by  adding 

pairs  of  independently  drawn  random  numbers  from  the  two  sets  {I,  } and 

~ I * 1 

{?9  .}.  The  question  is,  what  are  the  mean  and  variance  of  the  set 
*■ » * 

(I-  + I9  .}?  The  answer  is  provided  by  a.  well-known  theorem  of  statistics, 

proved  in  Appendix  C,  which  says  that  the  mean  and  variance  of  the  sum  of 
two  statistically  independent  sets  of  random  numbers  are  equal  respec- 
tively to  the  sums  of  the  means  and  variances  of  the  two  sets.  Thus,  the 
set  of  random  numbers  (l^j  + I2  ^ has  mean  I^f  ^ and  variance  Ai+  A2. 
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These  facts  evidently  enable  us  to  make  the  statement  (3.39). 

Another  frequently  encountered  situation  is  the  following:  We  are 

presented  with  an  integral  I of  the  form 

I - / f(*)dx  (3.40a) 

a 

where  f(£)  is  the  sum 

f(x)  * f 1 (x)  + f2(x)  (3.40b) 

It  is  desired  to  calculate  by  Monte  Carlo  methods  not  only  the  integral  I 
but  also  the  two  integrals 

I,  E / f (x)d5?,  k = 1 and  2 (3.41) 

k ft  k 

which  evidently  constitute  I according  to 

I - Ii  + I2  (3.42) 


Letting  {5^}  denote  the  set  of  random  points  distributed  uniformly 
over  ft  [i.e.,  according  to  the  density  function  P^(x)],  then  the  set  of 
random  numbers  (f^C^)}  has  mean  m^  and  variance  given  by 


and 


\ ■ <Vpf) 

< “ <fk:Ps)  - <£k!Pa>* 


If  we  use  N random  points  xi,^****,^  from  {x^}  to  compute  the  N-term 
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i 


# 

fa 


f 

pi 


average  [cf.  (3.6)],  then  if  N is  sufficiently  large  we  can  put 


\~-H(frV^±aik)>  land  2 


(3.43) 


Now,  if  we  use  one  set  of  N points  from  {x^}  to  calculate  (fiiP^)^ , 
and  a different  set  of  N points  to  calculate  (f2*.P^^,  then  these  two 
estimates  will  be  statistically  independent;  hence,  we  may  invoke  (3.39) 
and  obtain  for  I the  result 


1 “ l«l  (<f‘:Pf>N  + <f*:PQ>N  ±^P') 


(3.44) 


Suppose,  however,  that  we  calculate  ^fi:P^)N  and  in  (3.43) 

using  the  same  set  of  N points  from  {S^}.  This  would  obviously  be  desirable 
from  the  standpoint  of  calculating  only  Ii  and  I2,  since  it  would  require 
generating  only  N instead  of  2N  randcm  points  3?^.  Of  course,  if  this  were 
done  then  the  random  numbers  anc*  would  not  be  statisti- 

cally independent,  and  we  could  therefore  not  assert  the  result  (3.44)  for 
I.  However,  as  we  calculate  (fi:P^N  and  ^^sPq^  "in  parallel"  (i.e., 
using  the  same  set  of  N random  points),  it  would  clearly  require  very  little 
effort  to  also  calculate,  using  (3.40b),  the  quantity  ^f:P^)N.  Then  instead 
of  (3.44)  we  could  assert  for  I th£  usual  Monte  Carlo  result 


“ l°l(<flPflVk) 


(3.45) 


where 


a2  - <f2:Pn>-  <fsP„)a 


is  the  variance  of  the  set  {f  (iT±) } = {f  1 flc^+f  2 (x^  } . We  note 
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that,  because  of  (3.40b),  the  quantity  (f:P^N  in  (3.45)  could  also  be 
written  as  the  sum  of  {fi:P^)N  and  (f2:P^)N,  just  as  in  (3.44).  In  other 
words,  the  estimate 

I » W[<f.:Pn>N+  <fa:VNl 

is  valid  regardless  of  whether  (fi:P^)^  and  are  computed  using  the 

same  set  or  different  sets  of  N random  points  from  {x^} . However,  the 
uncer taint ies  in  these  estimates  are  not  in  general  the  same,  because  a2 
in  (3.45)  is  not  generally  equal  to  ai+a2  in  (3.44). 

In  summary,  then,  we  may  calculate  Ii,l2  and  I in  either  of  two  ways: 

One  way  is  to  first  make  separate  Monte  Carlo  calculations  of  Ii  and  I2 
[cf. (3.43) ] using  two  independent  sets  of  N random  points,  and  then  assert 
the  result  (3.44)  for  I.  An  alternate  way  is  to  make  three  simultaneous 
"parallel"  Monte  Carlo  calculations  of  I 1 , I2  and  I [cf.  (3.43)  and  (3.45)] 
using  a single  set  of  N random  points.  In  practice,  the  second  method  is 
usually  more  efficient.  The  fact  that  it  requires  generating  only  half 
as  many  points  as  the  first  method  usually  more  than  compensates  for  any 
excess  of  a over  /di+oli  furthermore,  a sometimes  turns  out  to  be  considerably 
less  than  /af+ai • 

The  relation  of  0 to  (?i  and  a 2 is  interesting,  and  deserves  further 
discussion.  Of  course,  in  any  actual  "parallel"  Monte  Carlo  calculation, 
we  would  simply  approximate  ai,  CJ2  and  a by 

ak  “ (fk:Pn^N  ■ (fk:Pn)N’  k = 1 and  2 
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— — — 


a2  - (f2;Pfl)N  " 

and  let  these  values  speak  for  themselves.  However,  it  is  instructive  to 
examine  under  what  conditions  and  by  how  much  a2  will  be  greater  than  or 
less  than  a2+ci.  In  Appendix  D we  prove  that  [cf.  (0.5) ] 


a2  = a2  + 02  + 2cov(f i ,f 2 :P0) 


(3.46) 


where  the  covariance  of  fi(x)  and  f2(x)  with  respect  to  Pfl(x)  is  defined  by 


C0v(fi  ,f  1 :Pjj)  i (f 1 f 2 : P.,)  - (f  1 2 :Pn) 


(3.47) 


Evidently,  then,  a2  will  be  greater  or  less  than  o?+o!  according  to  whether 
cov(f I,f 2:Pq)  is  positive  or  negative.  We  also  show  in  Appendix  D that 
cov(f i,f2:P0)  is  bounded  according  to  [cf.  (0.9)] 


-C\02  $ COv(f 1 ,f 2 :Pq)  ^ +C la2 


(3.48) 


Inserting  this  into  (3.46)  yields  the  inequality 


| a 1 - a2  ^ a < 0\  + 02 


(3.49) 


We  may  interpret  these  results  as  follows:  If  the  functions  fi(x) 

and  f 2 (x)  are  such  that  cov(f 1 ,f 2 :PQ)  assumes  its  maximum  possible  value 
of  +aia?.,  then  fi(x)  and  f2(x)  are  said  to  be  "maximally  positively  cor- 
related", and  the  rms  deviation  a of  the  set  {f  1 (x±)  + fzd^))  is  equal 
to  Oi+a2.  If  the  functions  fj(x)  and  f2(x)  are  such  that  cov(f 1 ,f 2 
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I 


then  the  rms  deviation  O of  the  set  {f  i (x^+f  2 (x^)}  is  equal  to  Vof+af ; 
this  is  the  same  as  one  would  get  by  forming  the  sums  of  independently 
chosen  numbers  from  the  two  sets  {f  1 (x^) } and  { f 2 ) ) . Finally,  if  the 

functions  fi(x)  and  f 2 (5?)  are  such  that  cov(f 1 ,f 2 :P^)  assumes  its  minimum 
possible  value  of  -O1O29  then  fi(x)  and  f2(x)  are  said  to  be  ‘'maximally 
negatively  correlated",  and  the  rms  deviation  a of  the  set  (f  1 (x^+f  2(x^)} 
is  equal  to  | a 1 —a 2 1 . 

In  the  present  instance,  it  is  clearly  advantageous  for  fi(x)  and  f2(x) 
to  be  negatively  correlated;  for  then  the  uncertainty  in  the  estimate  of  I 
obtained  using  a single  set  of  N points  via  (3.45)  would  actually  be  less 
than  the  uncertainty  obtained  using  two  sets  of  N points  via  (3.44).  Gener- 
ally speaking,  fi(x)  and  f2(x)  will  be  negatively  correlated  if  the  minima 
of  fi(x)  tend  to  occur  in  regions  where  f2(x)  has  its  maxima,  and  vice-versa. 


3-5.  Monte  Carlo  versus  Other  Nume~  \cal  Integration  Methods 

At  this  point  it  seems  appropriate  to  make  a few  brief  comments  on  the 
relative  advantages  and  disadvantages  of  the  Monte  Carlo  method  as  compared 
with  the  more  conventional  numerical  integration  methods. 

If  the  integral  is  one-dimensional  and  the  integrand  is  fairly  smooth, 
then  the  standard  quadrature  methods  are  far  superior  to  Monte  Carlo.  For 
example,  the  relatively  crude  Trapezoid  Rule  will  have  in  this  case  an 
associated  uncertainty  which  decreases  like  1/N2  as  the  number  N of  (evenly 
spaced)  sampling  points  increases;  by  contrast,  the  uncertainty  associated 
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with  the  Monte  Carlo  method  decreases  only  like  1/N1^2  as  the  number  N of 
(random)  sampling  points  increases.  However,  for  integrals  of  high  dimen- 
sionality the  situation  can  be  otherwise.  For  example,  in  d dimensions  the 

2/d 

uncertainty  inherent  in  the  Trapezoid  Rule  decreases  only  as  1/N  ' whereas 
the  uncertainty  in  the  Monte  Carlo  method  remains  proportional  to  1/N1^. 
Moreover  the  extensions  rf  the  conventional  methods  to  higher  dimensions 
are  usually  quite  complicated,  whereas  the  Monte  Carlo  procedure  is  rather 
insensitive  to  the  dimensionality  of  the  integral  (particularly  if  the 
integration  region  ft  is  box-like).  It  should  also  be  pointed  out  that  the 
conventional  methods  and  their  attendant  estimates  of  the  uncertainty  usually 
require  the  integrand  to  be  a fairly  "smooth"  function,  whereas  the  Monte 
Carlo  method  easily  accommodates  finite  step  discontinuities  of  the  type  fre- 
quently occurring  in  integrals  of  physical  interest.  In  light  of  these 
considerations,  it  is  seen  that  the  Monte  Carlo  method  can  be  a very  sensible 
choice  for  complicated,  multi-dimensional  integrals.  A rough  rule-of-thumb 
is  that,  for  integrals  which  cannot  be  conveniently  reduced  analytically 
below  a dimensionality  of  4 or  5,  the  Monte  Carlo  method  should  be  given 
very  serious  consideration. 

A particularly  attractive  feature  about  the  Monte  Carlo  method  is  its 
operational  simplicity,  and  the  relative  ease  with  which  it  produces  not 
only  the  estimate  of  the  integral  but  also  the  uncertainty  in  this  estimate. 
For  this  reason,  it  sometimes  makes  good  sense  to  use  the  Monte  Carlo  method 
even  when  it  is  not  the  most  "efficient"  method  (in  the  sense  of  producing 
an  answer  of  given  accuracy  using  a minimum  of  computer  time) , for  one  can 


105 


often  obtain  a sufficiently  accurate  answer  at  an  acceptable  computer  cost 
using  Monte  Carlo  much  faster  than  one  can  implement  a more  efficient  but 
more  complicated  standard  numerical  method.  And  even  if  it  is  desired  to 
have  the  normally  greater  accuracy  of  the  conventional  methods  when  the 
dimensionality  is  less  than  5,  a simple  Monte  Carlo  calculation  can  often 
provide  a reassuring  independent  check  against  gross  errors. 

Finally  it  should  be  noted  that,  from  the  standpoint  of  the  computer, 
the  Monte  Carlo  method  makes  very  minimal  demands  upon  storage  capacity  and 
input/output  devices.  In  particular,  one  does  not  have  to  store  lots  of 
points  x and/or  their  associated  f-values  [see  Fig.  9].  Thus,  when  executing 
a Monte  Carlo  program,  the  computer  will  normally  be  "compute  bound",  or 
limited  only  by  the  speed  with  which  it  can  perform  standard  arithmetical  and 
logical  operations.  From  a strictly  financial  point  of  view  (which  may  well 
be  the  most  sensible  measure  of  "efficiency")  one  should  therefore  pay  at- 
tention to  whether  ones  computer  charges  are  calculated  on  the  basis  of 
"cpu  time"  (i.e.,  the  time  actually  spent  by  the  central  processing  unit 
in  carrying  out  the  required  arithmetical  and  logical  operations),  or  on 
the  basis  of  "core  time"  (i.e.,  the  cpu  time  weighted  by  the  memory  storage 
used  a^d  the  number  of  times  input/output  devices  are  accessed).  For  a 
typical  compute-bound  Monte  Carlo  calculation,  core-time  charges  can  range 
from  2/3  to  only  1/10  of  the  cpu-time  charges.  Until  recently  most 
computer  centers  charged  on  the  basis  of  cpu  time;  however,  for  the  newer 
computers  which  run  in  a time-sharing  mode,  core  time  has  been  shown  to 
provide  a more  realistic  and  equitable  basis  for  charging  users.  As  a 


106 


result,  most  large  computer  centers  now  charge  on  the  basis  of  core  time,  a 
fact  which  makes  the  Monte  Carlo  method  today  even  wore  attractive. 

Despite  the  foregoing  advantages  of  the  Monte  Carlo  method  in  certain 
circumstances,  it  nevertheless  frequently  happens  that  one  plays  the  Monte 
Carlo  game  through  according  to  the  rules,  but  finds  that  ones  answer  has 
an  uncertainty  which  is  simply  too  large.  Since  increasing  N (and  therefore 
the  computer  running  time,  and  therefore  the  cost)  by  a factor  of  k decreases 
the  uncertainty  by  a factor  of  only  l//k,  one  is  tempted  in  such  cases  to 
discard  the  Monte  Carlo  method  as  unsuitable.  While  this  may  indeed  be  the 
appropriate  course  of  action,  one  should  not  take  this  step  without  giving 
some  consideration  to  the  variance  reducing  techniques  which  we  shall  out- 
line in  Chapter  4.  Essentially,  these  techniques  try  to  decrease  the 
numerator  of  the  uncertainty  in  (3.21)  without  significantly  increasing  the 
required  computer  time.  It  is  probably  fair  to  say  that  the  relatively 
recent  recognition  and  use  of  these  variance  reducing  techniques  has,  more 
than  anything  else,  served  to  elevate  Monte  Carlo  to  the  level  of  a "respect- 
able colleague"  of  the  conventional  numerical  quadrature  methods. 


Chapter  4 


VARIANCE  REDUCING  TECHNIQUES 
4-1.  General  Considerations 

We  have  seen  that  the  straightforward  Monte  Carlo  method  of  estimating 
the  integral 

I = Jf (5)dx  (4.1) 

consists  of  first  picking  N points  x^,  , . . . ,x^  from  the  set  of  random 

points  {x^}  distributed  uniformly  over  ft — i.e.,  according  to  the  density 
function  P^0c)  in  (3.4) — and  then  putting 

I - I ± A (4.2) 

where  the  estimate  I is 

i = |n||  I f(^)  w.3) 

i=l 

and  the  uncertainty  A is 

& ~ |n|  /var(f:P^  (4.4) 

/w 

In  (4.4),  var(f:L^)  is  just  the  variance  a2  of  the  set  of  random  numbers 
{f(xi)}[see  (3.16)], 

var(f:Pn)  5 (4.5)  - 
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which  in  actual  computations  is  always  approximated  according  to  [see 
(3.17)] 

var(f:V  “ 5 j/2^)  ' (g  Xf^i>)2  (4-6) 

It  is  clear  from  (A. 4)  that  the  uncertainty  A can  in  principle  be 
made  as  small  as  desired  by  taking  N large  enough.  However,  since  the 
time  required  to  perform  the  calculations  is  roughly  proportional  to  N, 
then  in  practice  the  size  of  N is  limited  by  the  amount  of  computer  time 
available;  for  example,  in  order  to  halve  the  Monte  Carlo  uncertainty 
obtained  in  a one  hour  computer  run,  we  would  have  to  perform  a four  hour 
computer  run.  Clearly,  then,  beyond  a certain  point  it  is  simply  not 
feasible  to  decrease  the  uncertainty  by  increasing  N.  It  would  seem  that 
the  only  alternative  to  increasing  N would  be  to  somehow  modify  f and/or 
in  such  a way  that  the  value  of  I is  essentially  left  unchanged,  while 
the  quantity  var (f :P^) gets  replaced  by  something  smaller.  Several  general 
procedures  have  been  devised  for  accomplishing  this,  and  in  the  present 
chapter  we  shall  give  a brief  discussion  of  four  of  these  so-called  "variance 
reducing"  techniques.  Whether  or  not  any  of  these  techniques  can  be  prof- 
itably utilized  in  any  given  instance  will  depend  very  strongly  upon  the 
specific  form  of  the  integrand  f (x)  and  the  integrating  region  ft,  as  well 
as  upon  the  resourcefulness  of  the  person  doing  the  calculation.  For  this 
reason  we  shall  not  be  able  to  develop  specific  recipes  for  blindly  applying 
these  variance  reducing  techniques;  all  we  can  do  is  outline  their  basic 
strategies. 
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To  get  a general  idea  of  just  what  is  involved  in  "reducing  the  variance", 
it  may  be  helpful  to  recall  the  discussion  of  Section  3-2.  There  we 
denoted  by  P(y)  the  density  function  of  the  set  of  random  numbers  {y^}  = .1(5?.)}. 
In  principle,  the  function  P(y)  is  uniquely  determined  by  the  two  functions 
f(x)  and  P^(x)  [see  Fig.  8],  but  in  practice  it  is  never  possible  to  calculate 
?(y)  analytically.  Nevertheless,  it  is  precisely  the  "center  of  gravity" 
of  the  curve  P(y)-versus-y,  namely  ^y :?)=^f  :P^)8  which  when  multiplied  by  |ft| 
gives  the  sought  for  value  of  the  integral  I.  Now,  essentially  what  we  do 
in  a Monte  Carlo  calculation  is  to  approximate  the  curve  P(y)-versus-y  by 
a frequency  histogram  of  N randomly  chosen  numbers  from  the  set  {y^}={f^)}. 
Provided  N is  sufficiently  large,  the  center  ^y:P^=^f  :pJ)^  of  the  frequency 
histogram  will  approximate  the  center  (y:P^  = ^f  :P^  of  the  P(y)  curve  to 
within  a ± uncertainty  equal  to  the  width  /var (y:P)=/var (f :P^)  of  the  P(y) 
curve  divided  by  A.  In  an  actual  calculation,  of  course,  we  also  approximate 
the  width  of  the  P(y)  curve  by  the  width  of  the  frequency  histogram,  which  is 
given  by  the  square  root  of  the  quantity  on  the  right  side  of  (4.6).  It 

is  sometimes  helpful  to  actually  plot  a frequency  histogram  of  the  y^=f (x^) 

•f* 

values  in  the  course  of  carrying  out  a Monte  Carlo  calculation  , since  such  a 
histogram  graphically  illustrates  just  what  one  is  up  against  in  obtaining 
an  accurate  Monte  Carlo  estimate  of  the  integral  at  hand:  The  broader 


Such  a histogram  should  be  built  up  continuously  as  each  new  y^=f(x^)  value 
is  obtained,  rather  than  all  at  once  at  the  end  of  the  calculation.  The  idea 
is  to  avoid  having  to  store  all  the  y-values  in  computer  memory. 
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this  frequency  histogram,  the  more  sensitive  its  center  will  be  to  random 
fluctuations  arising  from  the  finiteness  of  the  number  N of  y^-values 
sampled,  and  hence  the  more  uncertainty  will  be  associated  with  the  funda- 
mental approximation  ^y:?)  - ^y:?)^. 

Of  course,  a frequency  histogram  of  the  f (x4) -values  should  not  be 
confused  with  a plot  of  f (x) -versus-?.  Indeed,  the  shapes  of  these  two 
curves  are  sort  of  inversely  related  to  each  other:  If  f(x)  is  relatively 

constant  over  ft,  then  the  frequency  histogram  of  the  f(xi)-values  (or 
equivalently,  the  curve  P(y)-versus-y)  will  consist  primarily  of  a single, 
narrow  peak,  implying  that  var(f:P^)  will  be  relatively  small.  On  the  other 

hand,  if  f(5?)  is  peaked  and  consequently  assumes  a wide  range  of  values  in 
ft,  then  the  frequency  histogram  of  the  f (x  ) -values  (or  equivalently,  the 
curve  P(y)-versus-y)  will  be  broadly  spread  out,  implying  that  var(f:P^) 
will  be  relatively  large. 

Taking  all  these  things  into  consideration,  it  is  clear  that  any  variance 
reducing  technique  must  aim  at  modifying  things  in  such  a way  that  the  value 
of  the  integral  is  unchanged,  but  the  integrand  is  rendered  flatter  or  more 
nearly  constant  over  the  integrating  region.  The  consequent  narrowing  and 
sharpening  of  the  density  function  P(y)  of  the  set  of  integrand  values  {y^} 
will  make  its  true  center  easier  to  locate  by  a finite  sampling  procedure, 
thereby  reducing  the  Monte  Carlo  uncertainty  A.  Indeed,  if  we  could  some- 
how arrange  to  wind  up  with  an  integrand  which  is  perfectly  constant , then 
the  density  function  of  the  integrand  values  would  be  a single  spike  at 
that  constant  value  with  zero  width,  and  our  Monte  Carlo  estimate  would  be 
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exact . Of  course,  in  such  a case  the  integral  could  be  performed  analytically, 
and  Monte  Carlo  would  not  be  needed.  But  the  important  thing  here  is  that 
the  closer  we  can  get  to  the  ideal  situation  of  a constant  integrand  the 
more  accurate  our  Monte  Carlo  calculation  is  going  to  be.  This , in  es- 
sence j is  the  guiding  philosophy  of  all  variance  reducing  techniques. 

In  Sections  4-2  through  4-5  we  shall  sketch  four  different  strategies 
for  reducing  the  variance  in  a Monte  Carlo  integration.  These  strategies 
are  called  control  variates,  antithetic  variates,  stratified  sampling, 
and  importance  sampling.  We  shall  see  that,  despite  these  somewhat  esoteric 
names,  the  underlying  principle  of  each  method  is  really  quite  straightforward. 
This  writer’s  practical  experience  has  been  mainly  with  the  last  technique 
(importance  sampling),  and  in  Section  4-6  we  shall  describe  a crude  but 
often  effective  procedure  for  applying  that  technique  in  a rather  routine 
way. 

4-2.  Control  Variates 

Suppose  we  can  find  a function  f 0(x)  whose  integral  over  is  known 
exactly: 

I0  - /f  c.(*)dx  (4.7) 

£1 

Then  the  given  integral  I in  (4.1)  can  be  written 

I “ Io  + / [f  (x)-f0  (x)]dx  (4.8) 

ft 


112 


; ■c»vw*»,«a  v-'  +*b*r  <■+**  >iwtf<ifwwc»<  •' 


$ 


Now,  if  f 0(x)  has  a sufficiently  strong  correlation  with  f(x),  so  that 
f o (5^)  tends  to  be  large  where  f(x)  is  large  and  small  where  f0O  is 
small,  then  the  function  [f (x)-f0(*)]  will  be  more  nearly  constant  over  ft 
than  f(x)  alone  is.  In  such  a circumstance,  the  integral  I can  evi- 
dently be  determined  more  accurately  through  (4.8)  by  performing  a Monte 
Carlo  integration  of  [f(x)-foOO]  instead  of  by  Monte  Carlo  integrating 
f(x)  itself.  This  strategy  is  known  as  the  "control  variates"  method: 
essentially,  the  fluctuations  in  the  variate  f(x),  whose  mean  over 
ft  is  not  known,  are  to  some  extent  "controlled"  by  the  fluctuations  in 
the  variate  fo(x),  whose  mean  over  ft  is  known. 


More  quantitatively,  the  ratio  of  the  uncertainty  A in  calculating 
with  (4.8)  to  the  uncertainty  A in  calculating  with  (4.1),  assuming  the 
same  number  of  random  points  in  ft  are  used,  is  evidently 


4 


var (f-f  o :P^) 

var<£:V 


(4.9) 


Now,  according  to  (D.6) 


var (f-f o :P^)  * var(f:P^)  + var(f0:Pfi)  - 2cov(f ,f 0 :Pfi) 


where  the  covariance  of  two  functions  with  respect  to  a given  set  of 

random  points  is  defined  and  discussed  in  Appendix  D.  Hence,  we  will 
* 

evidently  have  A <A  provided 

cov (f ,fo ’Pq)  > jvar (f o :P^)  (4.10) 


This  last  inequality  tells  us  precisely  how  strong  the  correlation  between 
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fo(x)  and  f(£)  must  be  in  order  for  (4.8)  to  yield  a more  accurate 
Monte  Carlo  estimate  of  I than  (4.1).  Of  course,  in  practice  it  is 
never  possible  to  ascertain  beforehand  whether  or  not  a chosen  function 
fo(x)  satisfies  this  requirement,  because  cov(f,fo:PQ)  is  defined  in  terms 
of  integrals  which  are  generally  more  complicated  than  X itself  [see  (D.4)]. 
Therefore,  in  an  actual  calculation  one  would  have  to  be  content  with  finding 
an  integrable  function  fo(x)  whose  maxima  and  minima  roughly  correspond 
to  those  of  f(5?)*  A short  test  Monte  Carlo  run  could  then  resolve  the 
question  of  whether  or  not  var(f-fo:P^)  is  significantly  less  than  var(f:P^) 
simply  by  directly  estimating  these  two  quantities  in  the  usual  way  [see 
(4.6)]. 

We  see,  then,  that  the  difficulty  in  applying  the  control  variates 
method  lies  not  in  determining  whether  (4.10)  holds  for  a given  function 
fo(x),  but  rather  in  discovering  a suitable  fo(x)  in  the  first  place.  On 
the  one  hand,  fo(x)  must  be  simple  enough  that  its  integral  over  £2  can  be 
calculated  analytically;  on  the  other  hand,  fo(x)  must  be  intricate  enough 
to  follow  the  major  ups  and  downs  of  the  presumably  complicated  function 
f(x).  Therefore,  the  practical  limitations  on  the  control  variates  method 
are  essentially  those  imposed  by  one’s  limited  knowledge  of  the  detailed 
?,shape"  of  the  given  function  f (x)  over  £2,  as  well  as  onefe  limited  ability 
to  find  or  construct  exactly  integrable  functions  fo(x)  of  a similar  shape. 

4-3.  Antithetic  Variates 

Suppose  we  can  find  a function  fo(x)  whose  integral  over  £2  is  known 
to  be  equal  to  the  integral  of  the  given  function  f(x)  over  £2  (even 
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though,  of  course,  the  numerical  value  of  that  common  integral  is  unknown): 

Jf  (?)dS  - Jf (x)dx  - I (4,11) 

a 0 Q 

Then  the  given  integral  I in  (4.1)  can  be  written 

I - ji[f (x)+f o (x) ]dx  <4‘12) 

(T 

Now,  if  f,(Jt)  has  a sufficiently  strong,  anti-correlation  with  Hi) , so  that 
f0(x)  tends  to  be  large  where  f(x)  is  small  and  small  where  f(S)  is  large, 
then  the  function  |[f  <*)■*.©]  will  be  more  nearly  constant  over  Q than 
f(j)  is.  In  such  a circumstance,  the  integral  I can  evidently  be  determined 
more  accurately  through  (4.12)  by  performing  a Monte  Carlo  integration  of 
|[f(x)+fo(x)l  instead  of  by  Monte  Carlo  integrating  f(x)  itself.  This  strategy 
is  known  as  the  "antithetic  variates"  method:  essentially,  the  fluctuations 

in  the  variate  f (*)  tend  to  be  cancelled  by  the  opposing  fluctuations  in 
the  variate  f„(?),  with  the  result  that  the  fluctuations  in  the  variate 
i[f(x)+fo(*)l  are  smaller  than  either. 

More  quantitatively,  the  ratio  of  the  uncertainly  A*  in  calculating 
with  (4.12)  to  the  uncertainty  A in  calculating  with  (4.1),  assuming  the 
same  number  of  random  points  in  fi  are  used,  is  evidently 

* jvar(|[f+fo1:Pn)  (4>13) 

A * i var (f :Pq) 

Now,  according  to  (D.6) 
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var(|[f+f  o]  :Pq)  * |var(f:Pfi)  + jrvar(f0:Pfi)  + |cov(f  ,f  o :Pfi) 

= var(f :Pft)  - |var(f:P^)  + |var(fo:Pfi) 

+|cov(f ,f0:P^) 

Hence,  we  will  evidently  have  A*<A  provided 

cov(f,f0:P^)  < |var(f:P^)  - |var(f0:P^)  (4.14) 

This  last  inequality  tells  us  precisely  how  strong  the  anticorrelation 
between  fo(x)  and  f(x)  must  be  in  order  for  (4.12)  tc  yield  a more  accurate 
Monte  Carlo  estimate  of  I than  (4.1).  Of  course,  in  practice,  it  is  never 
possible  to  ascertain  beforehand  whether  or  not  a chosen  function  fo(x) 
satisfies  this  requirement,  because  cov(f,fo:PQ)  is  defined  in  terms  of 
integrals  which  are  generally  more  complicated  than  I itself.  Therefore, 
in  an  actual  calculation  one  would  have  to  ne  content  with  finding  some 
function  fo(x)  whose  integral  over  fi  is  known  to  equal  I and  whose  maxima 
and  minima  roughly  correspond  to  the  respective  minima  and  maxima  of  f (x) . 

A short  test  Monte  Carlo  run  could  then  resolve  the  question  of  whether  or 
not  var (|[f+f o] iPq)  is  significantly  less  than  var(f:P^)  simply  by  di- 
rectly estimating  these  two  quantities  in  the  usual  way  [see  (4.6)]. 

Clearly,  the  difficulty  in  applying  the  antithetic  variates  method 
lies  in  finding  a suitable  function  fo(x),  just  as  in  the  control  variates 
method.  On  the  surface  it  might  seem  that  it  would  be  exceedingly  dif- 
ficult to  find  a function  which,  on  the  one  hand,  has  the  same  integral 
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over  ft  as  the  giver*  function  f (x) , while  on  the  other  hand  is  strongly 
anticorrelated  with  f (x) . Usually  the  most  feasible  way  to  proceed  is 
to  define  fo(x)  in  terms  of  f(x)  itself.  As  a simple  illustration  of  how 
this  can  be  done,  consider  the  Monte  Carlo  evaluation  of  the  one -dimensional 
integral 

b 

I = Jf(x)dx  (4.15a) 

a 

Suppose  we  define  fo(x)  by 

fo(x)  = f(a+b-x)  (4.15b) 

That  this  function  fo(x)  satisfies  the  fundamental  requirement  that  its 
integral  from  a to  b equals  I is  easily  proved  by  changing  integration 
variables  according  to  x-^x*=a+b-x.  Thus  we  have  as  in  (4.12) 

b b 

I * / ^ [ f (x)  + f o (x) ]dx  * J|[f(x)  + f (a+b-x) ]dx  (4.15c) 

a a 

Now,  if  it  happens  that  f(x)  is  monotonically  increasing  (decreasing)  in 
a<x<b,  then  fo(x)  will  be  monotonically  decreasing  (increasing)  in  a<x<b; 
as  a consequence,  the  integrand  in  (4.15c)  will  be  more  nearly  constant  ovei 
a<x<b  than  f(x),  and  will  thus  have  a smaller  variance.  Indeed,  if  f(x) 
were  the  simple  linear  function  Ax+B,  then  the  integrand  in  (4.15c)  would 
be  a corptant_,  and  the  variance  would  be  zero . 

In  less  trivial  multidimensional  applications,  one  can  try  to  con- 
struct a suitable  function  f0(x)  in  terms  of  f (x)  in  an  analogous  way. 
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Specifically,  one  puts 


fo(x)  i f (x') 


(4.16a) 


where  SMf*  is  a transformation  which  satisfies  the  two  conditions 


• x+x'  maps  ft  onto  itself 

3x 

• the  Jacobian  -5^,  * 1 


(4.16b) 


These  two  conditions  insure  that  the  integral  of  fo(x)  over  ft  is  equal  to 
the  integral  of  f(x)  over  ft,  since 

/fo(x)dx  = ff(x')<H?  “ JfOf')j—  dx'  = / f(x’)dx' 

a a S2  |3x’  • n 

The  remaining  details  of  the  transformation  x+x*  [that  is,  all  properties 
not  specified  by  (4.16b)]  are  then  chosen  in  such  a way  that  the  transforma- 
tion tends  to  carry  points  for  which  f(x)  is  large  into  points  for  which 
fO?)  is  small,  and  vice-versa;  this  will  result  in  fo(x)  in  (4.16a)  being 
"anticorr elated”  with  f(x).  Clearly,  one  needs  to  know  a good  deal  about 
the  behavior  of  f(x)  in  ft  in  order  to  devise  such  a transformation  which 
will  anticorrelate  fo(x)  and  f (x ) to  a sufficient  degree  that  var (|(f o+f ) :P^) 
will  be  significantly  less  than  var(f:P^). 

4-4.  Stratified  Sampling 

Let  the  given  integrating  region  ft  be  partitioned  into  n subregions 
fti,ft2, . . . ,ftn*  Then  the  integral  I in  (4.1)  can  be  written 
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(4.17) 


n 

1 = l 

j=l  3 

where  Ij  is  the  integral  of  f (x)  over  the  subregion  S2j 


Ij  = / f(x)d£,  j*l>2 i 


(4.18) 


Q 


j 


Suppose  now  that  we  Monte  Carlo  integrate  each  integral  1^  separately;  that 
is,  we  pick  N.  random  points  xi  * • • • » xj.  from  the  uniform  distribution 

J j 

inside  ft  , and  we  put  [cf.  (4.2)— (4.4)] 


I.  « 


(4.19) 


where 


lo.lg 

J 1 i-1 


(4.20) 


and 


/var  (f  :P~  ) 


*1 ' 


(4.21) 


Since  these  Monte  Carlo  estimates  of  the  n 1^  integrals  are  statistically 
independent  of  each  other,  then  we  can  obtain  a Monte  Carlo  estimate  of 
the  sum  of  the  1^  integrals,  namely  I,  by  applying  the  prescription  in 
equations  (3 .37)-(3 .39) . Specifically,  we  may  assert  that  the  integral  I 
in  (4.1)  is  given  approximately  by  the  estimate 


I - Ii  + I2  +...  + I 


(4.22a) 
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and  further  that  the  ± Gaussian  uncertainty  associated  with  this  estimate  is 


* 

A 


(4.22b) 


The  foregoing  method  of  estimating  the  integral  I is  called  the 
"stratified  sampling"  method  (for  reasons  to  become  apparent  later) . It 
is  obviously  a legitimate  way  of  proceeding,  but  the  extra  effort  involved 
is  clearly  pointless  unless  the  uncertainty  A*  in  (4.22b)  is  significantly 
less  than  the  uncertainty  A in  (4.4),  given  that  the  total  number  of  points 
used  in  the  two  procedures  are  the  same.  The  question  of  interest,  then, 
is  as  follows:  Given  that 

n 

l N = N (4.23) 

J=1  J 

under  what  conditions  (if  any)  will 
n var(f:Pfl  > 

A*2  = l |ft  |2  jj 1 (4.24) 

j-1  3 j 

be  significantly  less  than 
var(f :P0) 

A2  = 1«|2 ^ ^ (4.25) 


In  addressing  this  question  let  us  begin  by  considering  a specific 
situation  which,  although  somewhat  contrived,  would  obviously  be  handled 
more  efficiently  by  the  stratified  sampling  method  than  by  the  ordinary 
sampling  method.  Thus,  suppose  J:(x)  consists  of  a number  of  perfectly  flat 
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"plateaus"  over  ft,  in  that 


f(Xj)  * Cj  for  x^eft^  (j-l,2,...,n) 


where  fti,ft2t...,ftn  are  a particular  set  of  n non-overlapping  subregions 

whose  union  is  ft.  In  such  a situation  we  would  evidently  have  var(f:P0  ) * 0 

* 1 

for  each  j,  so  that  the  uncertainty  A in  (4.24)  associated  with  a stratified 
sampling  over  the  corresponding  regions  fti,...,ftn  would  vanish.  On  the  other 
hand,  the  uncertainty  A in  (4.25)  associated  with  the  regular  method  would 
not  vanish  unless  f(x)  were  constant  everywhere  inside  ft — i.e.,  unless  the 
constants  Cj  were  all  equal  to  each  other. 

To  illustrate  this  situation  more  concretely.,  the  reader  can  easily 
verify  that  an  ordinary  N-point  Monte  Carlo  calculation  of  the  integral 
over  ft  = (0,1)  of  the  step  function 


1,  for  0^x^l/2 

f (x)  = 2,  for  l/2<xa 

will  have  an  associated  uncertainty  of 


A =|l 


/var(f:pn)  1/2 


However,  if  wc  partition  the  integrating  region  ft  = (0,1)  into  the  two 
subregions  fti=(0,l/2)  and  ft2*(l/2,l),  then  since  f (x)  is  constant  inside 
each  subregion  we  will  have  var  (f  :P^)*var  (f  :P^)=0;  therefore,  the 
uncertainty  A*  associated  with  a stratified  sampling  procedure  over  these 
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two  subregions  will  vanish. 

Addressing  the  problem  more  generally,  we  show  in  Appendix  E that, 
for  any  given  function  f(x)  and  any  partitioning  of  the  given  region  ft 
into  subregions  fti ,^2 > • . . »ftn>  we  have  [cf.  (E.4)] 

var(f:Pn)  = J<yar(f:Pn  ) + \ ^^((f  :VQ)  - (f  :Pn>  )2  (4.26a) 

j j j<^C  j ^ 

where 


aj  = Iftjl/lftl  (4.26b) 

The  two-term  structure  of  (4.26a)  reveals  that  the  variance  of  f (x)  over 

ft  can  be  regarded  as  coming  from  two  sources:  (i)  the  variations  in  f (x) 

within  each  of  the  various  subregions  [the  first  term  in  (4.26a)];  and 

(ii)  the  variations  in  f (x)  among  the  various  subregions  [the  second  term 

in  (4.26a)].  Now,  it  is  clear  from  equations  (4.25)  and  (4.24)  that, 

whereas  both  of  these  sources  of  variation  contribute  to  A,  only  the  first 

* 

source  contributes  to  A . Therefore,  if  we  can  devise  a partitioning 

of  ft  which  minimizes  (i) — or  equivalently  maximizes  (ii) — then  we  may 
* 

expect  that  A will  be  significantly  less  than  A.  This  in  essence  is 
the  guiding  philosophy  of  the  stratified  sampling  technique:  If  the 

integrand  f(x)  has  several  fairly  level  plateaus  or  "strata",  then 
independent  samplings  over  the  regions  fti,ft2,...,ftn  under  each  strata 
will  result  in  a more  accurate  estimate  of  the  integral  than  a sampling 
over  the  entire  region  ft  which  ignores  the  strata. 
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Apart  from  the  problem  of  determining  an  effective  partitioning 
scheme  for  the  region  9 — and  this  is  really  the  key  problem  in  applying 
the  stratified  sampling  procedure  — we  also  have  the  problem  of  deciding 
how  to  apportion  the  N sampling  points  among  the  various  subregions  9^  . 
One  simple  and  intuitively  plausible  way  of  doing  this  would  be  to  make 


N 


j 


proportional  to  the  size  of 


Thus,  we  would  put 


(4.27a) 


This  of  course  is  roughly  how  the  points  would  apportion  themselves  in  an 
ordinary  "unstratif led"  sampling  procedure.  If  we  insert  (4.27a)  into  the 
expression  for  A*  in  (4.24)  and  then  divide  by  the  expression  for  A in 

(4.25),  we  find  that  this  way  of  apportioning  the  points  produces  the 

* 

following  ratio  of  A to  A: 

(4.27b) 

Now,  since  the  second  term  in  (4.26a)  cannot  be  negative,  we  see  that  the 
numerator  in  (4.27b)  cannot  exceed  the  denominator  [cf.  (E.5)].  Thus  we 


scription  (4.27a),  then  regardless  of  how  astutelv  the  partitioning  is 

* 

chosen  we  shall  have  A ^A.  However,  it  is  also  clear  from  (4.26a)  that 

the  more  care  one  takes  to  exploit  any  "plateau-like"  behavior  of  f(x) 

* 

in  choosing  the  partitioning,  the  smaller  the  ratio  A /A  is  going  to  be. 

Actually,  the  apportioning  of  points  to  each  subregion  strictly 
according  to  the  size  of  the  subregion  is  not  the  optimum  procedure, 
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even  though  as  we  have  seen  it  can  never  result  in  making  A >A  . In 
Appendix  F we  show  that  the  best  procedure  is  to  make  proportional  to 
the  size  of  Q.  times  the  rms  variation  of  f(x)  over  [cf.  (F.9)]: 

Nj  - KlfljIVvarCf :Pn  ) (4.28a) 

Here  the  constant  K is  to  be  determined  by  the  requirement  Of 

course,  we  would  normally  not  have  any  a priori  knowledge  of  the  quanti- 
ties var(f:F0  ).  In  practice,  therefore,  a sensible  procedure  to  follow 

j 

would  be  to  apportion  the  points  according  to  (4.27a)  in  a short  preliminary 
calculation,  and  then  adjust  the  apportionment  more  along  the  lines  of 
(4.28a)  on  the  basis  of  the  estimates  for  var(f:PQ  ) obtained  in  the 

J 

preliminary  calculation.  By  combining  (4.28a)  with  (4.24)  and  (4.25),  it 
is  a simple  matter  to  show  that  this  optimum  apportionment  of  sampling 
points  leads  to  [cf.  (F.10)] 

* IjYOTTV 

J ■ — 1 (4.28b) 

/var(f:Pn) 

instead  of  (4.27b). 

Again,  we  should  emphasize  that  the  major  problem  presented  by  the 
stratified  sampling  method  is  to  discover  a sensible  way  of  partitioning 
Q into  subregions  • Generally  speaking,  the  method  is  worth- 

while only  if  the  "curve"  f (x)-versus-x  exhibits  a quasi-plateaued 
appearance.  One  must  then  be  able  to  associate  with  each  plateau  a 
subregion  whose  shape  is  sufficiently  simple  that:  (i^)  its  volume 
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|ftj|  can  be  calculated  exactly,  and  (11)  random  points  can  easily  be 
generated  uniformly  inside  ft^ . As  with  the  control  variates  and  the 
antithetic  variates  methods,  a successful  application  of  the  stratified 
sampling  method  clearly  requires  a fairly  detailed  knowledge  of  the  be- 
havior of  f(x)  in  ft. 

4-5.  Importance  Sampling  (Theory) 

Let  P(x)  by  any  probability  density  function  which  is  normalized 
on,  and  non-zero  in,  the  given  integrating  region  ft: 

Jp(x)dx  « 1 (4.29a) 

12 

P(x)  > 0 for  all  x£J2  (4.29b) 

The  latter  property  allows  us  to  multiply  and  divide  the  integrand  in 
(4.1)  by  P(x)  to  obtain  the  following  equivalent  expression  for  the 
integral  I: 

I = J[f(x)/P(x)]P(x)dx  (4.30) 

(2 

With  (3.3)  we  can  write  this  as 

1 = <f/?:p)  (4.31) 

which  says  that  the  integral  (4.1)  can  be  regarded  as  the  average  of 
f(x)/P(£),  taken  with  respect  to  the  set  of  random  points  {x^}  distributed 
according  to  the  density  function  P(x).  From  a Monte  Carlo  point  of  view 
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this  implies  that  if  x\  tX2 > • • • *x^  are  N random  points  distributed 
according  to  the  density  function  P(x) , then  we  can  put 


where 


* 


± A 


I 


i N 

i Tf  (3f  )/P(5T, ) 
Ni-1  1 1 


(4.32) 


(4.33) 


and 

A*  _ /var (f/P:P) 


(4.34) 


We  remark  again  that,  except  for  the  requirements  (4.29),  the  form 

of  the  density  function  P(x)  is  quite  arbitrary.  One  possible  choice 

for  P(x)  is  of  course  the  uniform  density  function  P^(x)  in  (3.4).  For 

this  choice  we  have  f (x)/P(5?)=|ft|f (x)  everywhere  inside  Q , so  that  (4.31) 

reduces  to  (3.5),  and  (4 .32)-(4 .34)  reduce  to  (4.2)-(4.4),  respectively. 

These  latter  equations  have  formed  the  basis  for  most  of  our  discussion 

of  the  Monte  Carlo  method  of  evaluating  integrals.  However,  the  foregoing 

observations  suggest  that  we  need  not  be  inextricably  wedded  to  the  uni- 

form  distribution.  As  we  shall  see  below,  whereas  the  average  I in  (4.33) 

* 

is  essentially  independent  of  the  form  of  P(x),  the  uncertainty  A in 
(4.34)  depends  rather  strongly  on  the  form  of  P(x).  Therefore,  if  the 
uncertainty  which  results  from  the  usual  procedure  of  taking  P(x)=P^(x) 
[i.e.,  A in  (4.4)]  turns  out  to  be  unacceptably  large,  it  might  be  possible 
to  use  some  other  form  for  P(x)  and  thereby  reduce  the  uncertainty. 

Let  us  first  verify  that  I in  (4.33)  is,  in  the  limit  of  large  N, 
independent  of  the  form  of  P(x).  Essentially,  this  follows  from  the 
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observation  that  I in  (4.33)  is  by  definition  (f/P:p)N>  and 


lim(f/P:p)N  - (f/P:p)-  /[f (jf)/P(3f)]P(3c)dx 

N-KO 


SO 


lWf/P:PV,  = Jf(x)dx  (4.35) 

To  appreciate  this  important  point  in  the  context  of  an  actual  Monte 
Carlo  process,  let  us  imagine  changing  from  P(x)  to  some  new  density 
function  P#(5?) . Suppose  that,  inside  a given  infinitesimal  region  dft  of  ft, 
P#(x)  is  k times  as  large  as  F(x): 

P'  (3c)  - kP  (x) , for  xedfi 

This  implies  that,  in  the  limit  of  large  N,  we  shall  sample  k times  as 
many  random  points  inside  dft  with  Pf  as  with  P.  However,  each  such 
point  sampled  with  Pf  will  contribute  to  the  sum  in  (4.33)  the  amount 

f(S)/P’(j?)  = f(x)/[kP(*)]  = i[f  (3T)/P(5c)] , fortedfi 

which  is  precisely  1/k  times  the  contribution  of  each  point  sampled  inside 
dft  with  P.  Clearly,  if  we  sample  k times  as  many  points  inside  dft  while 
weighting  the  contribution  of  each  such  point  by  a factor  l/k,  then  the 
net  contribution  of  dft  to  the  sum  in  (4.33)  will  be  unchanged.  Applying 
this  argument  to  every  infinitesimal  subregion  dft  of  ft  [allowing  the 
value  of  k to  vary  among  these  infinitesimal  subregions  in  accordance 
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with  the  behavior  of  the  functions  P(x)  and  P'OO],  we  thus  see  that 
<>** 

I in  (4.33)  is  indeed  insensitive  to  the  form  of  P(x). 

* 

Such  is  not  the  case,  however,  with  A in  (4.34).  From  a purely 
formal  point  of  view  we  have 


var(f/P:P)  = ((f/P)2:p)  - (f/P:p)2 

= /[f(x)/P(x)]2P(x)dx  - lj[f(*)/P(50]P(x)d*r 

n In  * 


Thus 


var(f/P:P)  - Jf  2 (x)P*'(x)dx  - I2  (4.36) 

n 

which  shows  that  var(f/P:P),  and  hence  A in  (4.34),  indeed  depends  upon 
the  functional  form  of  P(x).  The  question  now  is,  how  can  we  choose 
P(x)  to  minimize  the  uncertainty? 

Qualitatively,  the  answer  to  this  question  is  rather  obvious:  To 

minimize  the  variant  a of  f(x)/P(x)  with  respect  to  any  set  \ aom 
points,  we  must  choose  P(x)  so  as  to  make  f(x)/P(x)  as  constant  as  possible. 
The  more  nearly  constant  f(x)/P(>?)  is,  the  smaller  the  variations  will 
be  among  the  individual  terms  in  (4.33),  and  the  less  will  be  the  un- 

+rk 

certainty  associated  with  their  average  I . 

More  quantitatively,  we  prove  in  Appendix  G that  the  density  function 

^ 4* 

P . (5?)  which  minimizes  var(f/P:P),  and  hence  A , is ' 
mm 


^The  results  (4.37)  and  (4.38)  were  apparently  first  derived  by  H.  Kahn 
(Ref.  9). 
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Pmin(S)  = <4-37> 

$"2 

In  other  words,  the  opLimum  choice  for  P(x)  is,  apart  from  a normaliza- 
tion constant,  just  the  function  |f(x)|.  Inserting  this  optimum  density 

: k 

function  into  (4.36),  we  find  that  the  smallest  possible  value  of  A is 
[cf.  (G.18b) ] 


min 


fff)|d*-J|f(*)| 

ft- 


dx 


(4.38) 


where  ft+  is  that  portion  of  ft  in  which  f (x)  is  everywhere  positive,  and 

ft  is  that  portion  of  ft  in  which  f (3?)  is  everywhere  negative.  Evidently, 

we  will  have  A*  , *0  if  and  only  if  f(5?)  never  changes  sign  insicj  ft.  For 
min 

example,  suppose  f(x)  were  everywhere  positive  inside  ft.  Then  we  could 
take  in  accordance  with  (4.37) 


P(x)  * f (x)/Jf (x')dx1  - f (x) / I (4.39) 

ft 

so  that  f(50/P(x)  would  simply  be  equal  to  the  constant  I everywhere  in- 
side ft.  In  this  case  we  would  have  var  (f /P  :P)**var  (I  :P)=0  [this  also 
follows  by  substituting  (^* , 39>  into  the.  right  side  of  (4.36)],  so  that 
the  uncertainty  A would  vanish.  However,  this  nice  state  of  affairs  is 
somewhat  spoiled  by  the  following  considerations:  If  A in  (4.4)  is 

indeed  to  *»  large,  then  f (x)  evidently  assumes  a wide  ranee  of  values 
inside  ft.  In  such  a case,  efficiency  considerations  would  preclude 
generating  random  points  x^  according  to  the  density  function  in  (4.39) 
by  the  rejection  method,  and  we  would  have  to  use  the  inversion  method. 
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Now,  in  order  to  use  the  inversion  method,  we  must  be  able  to  calculate 

analytically  the  "normalizing  constant"  1/1  in  (4.39);  however,  by  hypothesis 

we  cannot  do  this.  Similar  considerations  apply  to  the  general  case  in 

(4.37),  and  we  are  forced  to  conclude  that  it  is  in  practice  not  feasible 

to  choose  P(x)*P  , (x)  and  so  achieve  the  minimum  uncertainty  in  (4.38). 
min 

Nevertheless,  the  foregoing  results  do  provide  us  with  something  to 
aim  for  in  choosing  an  importance  sampling  density  function  P(x):  We 

should  try  to  choose  P (x)  s£  that  it  follows  | f (x) | , proportionately , as 
closely  as  possible;  that  is,  P(x)  should  tend  to  be  large  where  |f(x)|  is 
large  and  small  where  | f (x) | is  small.  This  will  result  in  f(x)/P(x)  being 
a more  constant  or  less  varying  function  of  x in  ft  than  f(x)  alone.  As  a 
consequence,  var(f/P:P)  will  be  smaller  than  var (f |ft | :P^) , and  A in  (4.34) 
will  be  smaller  than  A in  (4.4). 

In  so  choosing  P(x)  to  be  large  where  | f (x) | is  large,  we  will  evidently 
be  "biasing"  our  random  point  generating  procedure  in  such  a way  that  we 
sample  more  points  x^  in  those  regions  of  ft  where  | f (x) | is  relatively  large. 
For  this  reason  this  method  of  reducing  the  variance  is  called  "importance 
sampling":  we  sample  most  intensely  in  the  "important"  regions  of  ft  where 

f(5?)  contributes  most  strongly  to  the  integral.  As  previously  noted, 
this  sampling  bias  is  compensatet  for  by  dividing  the  value  of  f at  each 
point  by  the  value  of  P at  the  same  point . The  result  is  that  we  get 
the  same  average  as  in  the  uniform  sampling  case,  but  since  the  values 
f(xi)/P(?i)  being  averaged  exhibit  less  variation  than  f(jf^),  the  uncertainty 
is  reduced . 
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In  importance  sampling,  we  therefore  seek  a density  function  P(x)  which 
is  such  that  (i)  P(x)  follows  |fC£)|  as  closely  as  possible,  and  (ii)  random 
points  can  be  generated  according  to  P(x)  fairly  easily.  Ultimately  of 
course,  these  two  requirements  are  incompatible  with  each  other:  On  the 

one  hand  P(x)  must  be  intricate  enough  to  follow  the  major  variations  of 
the  presumably  complicated  function  f(x),  while  on  the  other  hand  P(x) 
must  be  analytically  simple  enough  so  that  an  efficient  generating  algorithm 
can  be  devised.  Clearly,  one  must  in  practice  strive  for  a reasonable  com- 
promise between  these  two  requirements.  Tie  potential  success  of  the  mathod 
in  any  particular  instance  will  therefore  hinge  upon  one's  knowledge  cf  the 
behavior  of  the  integrand,  as  well  as  upon  one's  ability  to  construct 
efficient  algorithms  for  generating  random  numbers  according  to  prescribed 
density  functions. 

It  should  be  noted  that  it  is  quite  possible  for  an  importance  sampling 
procedure  to  make  things  worse  instead  of  better:  In  making  P(x)  larger 

than  P^(x)  in  certain  regions  of  ft,  we  must  make  P(x)  smaller  than  P^(x)  in 
other  regions  of  ft,  simply  in  order  for  P(x)  to  satisfy  the  normalization 
condition  (4.29a).  in  our  zeal  we  may  inadvertently  make  P(x)  so  small 
in  some  region  that  the  quantity  f2(x)P*(x)  in  (4.36)  becomes  correspondingly 
too  large,  and  actually  increases  the  overall  variance.  In  practice,  there- 
fore, one  must  determine  a suitable  function  P(x)  by  a rather  cautious  and 
tentative  trial -and -error  process.  In  the  next  section  we  shall  describe 
in  somewhat  more  detail  one  practical  approach  to  this  problem. 
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. Importance  Sampling  (Application) 


t 


In  this  section  we  shall  outline  a specific  procedure  for  applying 
the  variance  reducing  technique  described  in  the  previous  section.  The 
procedure  is  rather  crude,  but  it  has  the  advantage  of  being  relatively 
routine  and  easy  to  apply.  In  the  author’s  recent  work  (Ref.  4)  this 
procedure  has  been  able  to  decrease  Monte  Carlo  uncertainties  by  amounts 
equivalent  to  increasing  the  number  of  sampling  points  by  anywhere  from  a 
factor  of  2 to  a factor  of  200,  depending  upon  the  integral  considered. 

To  apply  this  importance  sampling  procedure  to  the  calculation  of  an 
n-dimensional  integral  of  the  form 

I = Jf(3f)dtf  (4.40) 

n 

it  is  convenient  to  begin  by  recasting  the  integral  as  an  integral  over 
the  n -dimensional  unit  cube, 

I * )drijdr2-  * -Jdrn  h(n  ,r2 , . . . ,rn)  (4.41) 

0 0 0 

Such  a recasting  of  the  integral  can  always  be  carried  out,  and  in  fact 
it  is  essentially  equivalent  to  "preparing”  the  integral  for  the  Monte 
Carle  averaging  process.  To  see  this  more  clearly,  let  us  consider  the 
three-dimensional  integral 

I = /// f (x,y,z)dxdydz  (4.42) 

ft 

If  one  can  represent  the  integrating  region  of  (4.42)  in  the  general 

form 
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* {(x,y,z)||  ai^x^bi,  a2 (x)$y$b2 (x) , a3  (x,y)$z<b3 (x,y)}  (4.43) 


then,  provided  the  boundary  functions  a2,b2,a3,b3  are  simple  enough,  one 
can  apply  the  generalized  inversion  method  to  generate  random  points 
uniformly  inside  ft.  Thus,  as  described  in  connection  with  Eqs.  (2.31)- 
(2.34),  one  firs’;  "conditions"  the  density  function  P^(x,y,z)  in  the 
form  P(x)P(y| x)P(z |x,y)  and  calculates  the  corresponding  one-variable 
distribution  functions  F(x),  F(y|x),  F(z|x,y).  Then,  where  ri,r2,r3  are 
three  random  numbers  from  a uniform  distribution  in  the  unit  interval,  one 
inverts  the  equations 


ri  * F(x) 
r2  * F(y|x) 
r3  - F(z|x,y) 


(4.44) 


to  obtain  a random  point  (x,y,z)  from  the  uniform  distribution  inside  ft. 
[This  procedure  is  particularly  straightforward  if  ft  is  a box,  in  which 
case  all  the  a^  and  b^  are  constants;  see  (2.35).]  Using  (4.44)  to 
generate  random  points  uniformly  inside  ft,  one  then  proceeds  to  calculate 
I in  (4.42)  as  |ft|  times  the  average  of  f(x,y,z)  over  these  random 
points.  However,  suppose  that  instead  of  regarding  (4.44)  as  a set  of 
"generating  formulae",  we  look  upon  (4.44)  as  defining  a transformation 
of  variables.  This  transformation  evidently  carries  ft  in  xyz -space  into 
the  unit  cube  in  r ir2r3-space;  moreover,  it  has  the  convenient  property 
that  its  Jacobian  is  given  simply  by  [cf.  (2.30)  and  (2.32)] 


a<*.y>z)  . 

9(n,r2,r3) 


Thus,  the  transformed  integral 


(4.45) 
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I 


///  f(x,y,z) 

unit 

cube 


9(x,y,z) 

3(ri,r2,r3) 


dridr2dr3 


takes  the  form 


I 


* Jdri/dr2/dr3  h(n,r2,r3) 


(4.46) 


where 


h(n,r2,r3)  * |Q|f(x,y,z) 


(4.47) 


with  x,y  and  z now  being  regarded  as  functions  of  ri,rz  and  r3  through 
the  inverse  of  Eqs.  (4.44). 

Alternatively,  if  the  functions  a^  and  b^  in  (4.43)  are  analytically 
so  complicated  that  Eqs.  (4.44)  are  intractable,  one  can  try  the  simple 
"linear  stretching"  transformation 


r i 35  [x-ai]/[bi-ai] 

r2  = [y-a2 (x) ]/ [b2 (x)-a2 (x) ] 

r 3 " [z-a3 (x,y) ]/ [b3 (x,y)-a3 (x,y)] 


(4.48) 


Like  (4.44),  this  transformation  also  carries  Q in  xyz -space  into  the  unit 
cube  in  r jr2r 3 -space;  however,  its  Jacobian  is  given  by 


3(x,y,z) 
3(n  ,r2 ,r3) 


[ b i —a i ] [ b 2 (x)-a2 (x) ] [b3 (x,y)-a3 (x,y) ] 


(4.49) 


instead  of  (4.45).  Applying  this  transformation  to  I in  (4.42)  we  again 
obtain  an  expression  of  the  form  (4.46),  except  that  the  integrand  is 
now  given  by 
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h(n»r2,r3)  - [bi-ai][b2(x)-a2(x)] [b3 (x,y)-a3(x,y)]f (x,y,z) 


(4.50) 


instead  of  (4.47),  where  x,y  and  z are  functions  of  r3t  r2  and  r3 

through  the  inverse  of  (4.48).  If  one  has  the  option  of  proceeding 
via  either  (4.44)  or  (4.48),  the  optimal  way  will  normally  be  the  one  for 
which  the  function  h [either  (4.47)  or  (4.50)]  is  a more  nearly  constant 
function  of  ri,  r2  and  r3  inside  the  unit  cube,  since  this  will  produce  a 
smaller  uncertainty  in  the  subsequent  Monte  Carlo  averaging  of  (4.46). 

If  one  is  not  able  to  specify  ft  in  the  ordered  form  of  (4.43),  one 
can  try  enclosing  0 in  a larger  region  ft*  ahich  can  be  so  represented; 
for  example,  ft'  might  be  taken  to  be  a simple  box-like  region  which  has 
11  as  a subregion.  For  any  such  covering  region  ft*  we  define 


f f(x,y,z) 


f(x,y,z),  if  (x,y,z)eft 

0 , if  (x,y,z )£ft 


(4.51a) 


so  that  the  integral  (4.42)  can  be  written 

I = ///f * (x,y,z)dxdydz  (4.51b) 

ft* 

This  expression  may  now  be  reduced  by  either  of  the  methods  just  described 
to  the  form  (4.46).  If  this  procedure  too  proves  fruitless,  then  one 
either  must  find  some,  transformation  of  variables  (x,y,z)**(x’  ,y T ,z f ) which 
carries  ft  into  a region  ft’  for  which  one  of  the  above  methods  can  be 
applied  [cf.  (3.30)],  or  else  one  must  consider  abandoning  the  Monte  Carlo 
approach  altogether. 
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The  foregoing  observations  should  make  it  clear  that  all  the  standard 
methods  of  "preparing"  the  integral  (4.40)  for  the  Monte  Carlo  averaging 
process  can  actually  be  regarded  as  transforming  the  given  integral  to  the 
form  (4.41).  The  underlying  reason  for  this  is  that,  in  the  final  analysis, 
our  only  input  data  in  a Monte  Carlo  calculation  are  random  numbers  from 
the  uniform  distribution  in  the  unit  interval.  Therefore,  any  n -dimensional 
Monte  Carlo  integration  ultimately  amounts  to  the  calculation  of  the  average 
of  some  function  with  respect  to  the  uniform  distribution  of  points  inside 
the  n-dimensional  unit  cube.  From  this  point  of  view,  the  generating 
formulae  [e.g.,  the  inverses  of  (4.44)  or  (4.48)]  are  merely  equations  which 
help  specify  how  the  function  h(ri,...,rn)  is  obtained  from  the  given 
integrand  f (x) . 

Regarding  I,  then,  as  the  integral  of  a function  h(n,...,r  ) over  the 

n-dimensional  unit  cube  as  in  (4.41),  our  proposed  importance  sampling 

procedure  is  as  follows.  First  we  set  up  a computer  program  to  calculate  I 

by  the  standard  Monte  Carlo  procedure;  that  is,  we  write  a computer  program 

to  implement  the  steps  in  (3.22)  and  Fig.  9,  but  with  f (x)  everywhere 

replaced  by  h(ri,...,r  ) [via  the  chosen  generating  formulae  for  the 

n 

components  of  3c]  and  with  ft  everywhere  replaced  by  the  n-dimensional 
unit  cube. 

Next,  we  incorporate  into  this  computer  program  a set  of  statements 
which  keeps  track  of,  say,  the  50  highest  and  the  50  lowest  values  of  the 
integrand  y*h(ri , . . . ,r^)  encountered  in  the  course  of  the  calculation, 
along  with  the  coordinates  inside  the  n-dimensional  unit  cube  where  these 
extremal  values  occurred.  This  evidently  entails  setting  aside  (n+l)xl00 
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storage  locations  [50  for  the  highest  y-values,  n*50  for  their  associated 
coordinates,  and  likewise  for  the  50  lowest  y-values  and  their  coordinates], 
along  with  a block  of  control  statements  which  updates  these  locations 
for  each  new  point  (ri,...,rn>  generated.  For  example,  in  a three-dimensional 
integration  one  could  define  the  variables  YHI(K) , R1HI(K),  R2HI(K),  R3HI(K) 
for  K ranging  from  1 to  50,  with  the  understanding  that  YHI(K)  always 
contains  the  Kth  highest  y-value  found,  and  (R1HI(K),  R2HI(K),  R3HI(K))  the 
location  of  the  point  (ri,r2,r3)  in  the  unit  cube  where  that  value  was 
found;  similarl> , YLO(K)  would  always  contain  the  Kth  lowest  y-value  found, 
anr  (R1L0(K),  R2L0(K),  R3L0(K))  the  coordinates  of  the  corresponding  point. 
Then  for  each  new  random  point  (ri,r2,r3)  generated  inside  the  unit  cube, 
we  not  only  incorporate  its  integrand  value  y»h(r i ,r2 ,r3>  into  the  cumulating 
sums  Si  and  S2  [cf.  (3.22)  and  Fig.  9],  but  we  also  check  to  see  if  y is 
greater  than  YHI(50)  or  less  than  YLO(50).  If,  for  instance,  the  former 
were  found  to  be  the  case,  then  the  current  values  for  YHI(50),  R1HI(50), 
R2HI(50),  R3HI(50)  would  be  discarded,  and  the  high-values  table  would  be 
shifted  so  as  to  incorporate  this  newest  high  value  and  its  coordinates  at 
the  appropriate  level. 

Now  we  make  ’’Preliminary  Run  #1",  an  initial  computer  run  of  the 
above  Monte  Carlo  program  which  uses  just  enough  random  points  to  yield  a 
reasonable  estimate  of  the  uncertainty  A,  as  well  as  a reasonable  sampling 
of  the  highs  and  lows  of  the  integrand  inside  the  unit  cube.  We  next 
examine  the  coordinates  of  the  high  and  low  integrand  values  with  a view 
to  determining  if  these  extremal  values  seem  to  be  associated  with  a 
relatively  narrow  interval  of  one  or  more  of  the  r ^-coordinates.  To  the 
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extent  that  such  an  "important"  interval  on  any  r^axis  can  be  identified, 
the  idea  is  to  apply  an  appropriate  form  of  importance  sampling  on  that 
r ^-variable  independently  of  the  other  coordinate  variables. 


Suppose  it  is  found  from  Preliminary  Run  //I  that  the  integrand  assumes 
extremal  values  (i.e.,  values  far  from  the  average  integrand  value)  when- 
ever r^  falls  inside  some  small  subinterval  (a^,3^)  of  the  unit  interval. 

We  then  choose  some  probability  density  function  P^(r^)  which  is  normalized 
on  and  non-zero  in  the  interval  O^r^l,  and  which  has  the  further  property 
that  it  is  large  whenever  otj<rj<^j*  Appendix  H we  discuss  several  forms 
for  P^(r^)  which  are  suitable  for  various  intervals  Since  the 

function  Pj(r^)  *s  non-zero  in  0<r.<l,  we  can  multiply  and  divide  the  integrand 
in  (4.41)  by  Pj(rj)  t0  obtain 


* * ' /h(n  )\ 


(4.52) 


Now,  in  the  spirit  of  the  discussion  in  the  first  part  of  this  section,  let 
us  make  the  change  of  variable 


r 


j 


(r)dr  = T.(r.) 


(4.53) 


where  F^r^)  *s  the  distribution  function  corresponding  to  the  density 
function  P^(r^).  From  (4.53)  it  is  seen  that  as  r^.  ranges  from  0 to  1, 
rj  also  ranges  from  0 to  1;  furthermore,  it  is  seen  that  dr!  is  precisely 
equal  to  P^r^dr^.  Therefore,  (4.52)  can  be  written 


.ii/ 

I - /drr--/dr'---/drn( 
0 0 - 0 \ 


h(rl,...,rj,...,rn)> 

VV  ' 


(4.54) 
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where  r^  in  the  integrand  is  now  to  be  regarded  as  a function  of  rj  through 
the  inverse  of  (4.53);  that  is,  rj“Fj^rj^*  Comparing  (4.41)  with  (4.54), 
it  is  clear  that  the  Monte  Carlo  uncertainty  associated  with  the  latter 
should  be  smaller  than  the  uncertainty  associated  with  the  former;  for, 
the  large  value  of  the  denominator  in  the  integrand  of  (4.54)  for  a^<r^<3j 
will  moderate  the  extremal  behavior  of  the  numerator  there,  with  the  result 
that  the  variance  of  the  integrand  in  (4.54)  will  be  less  than  the  variance 
of  the  integrand  in  (4.41).  From  a slightly  different  point  of  view,  whereas 
in  (4.41)  we  pick  r^  randomly  according  to  the  unit  uniform  distribution,  in 
(4.54)  we  pick  r^  randomly  according  to  the  density  function  P^(r^).  To 
"correct"  for  this  sampling  bias  we  must  introduce  a factor  1/P^(r^)  into  the 
integrand,  and  this  factor  has  been  specifically  chosen  to  "smooth"  the 

f 

integrand.  This  is,  of  course,  the  basic  philosophy  of  importance  sampling. 


If  the  extremal  coordinates  list  should  indicate  that  when  r^  is  near 

0 (or  1)  the  integrand  assumes  high  extremal  values,  while  when  r^  is  near 

1 (or  0)  the  integrand  assumes  low  extremal  values,  then  it  may  be  better  to 
use  a form  of  the  antithetic  variates  method  inste?'’  of  importance  sampling. 
To  this  end  we  introduce  the  function 

h (r  j , . . . ,rn)  - 2 [h(r i , • • • ,r^  , . . . ,r^)  + .i(ri , • • • ,1—  r^«*%^r^)  ] 

the  integral  of  which  (over  the  unit  cube)  is  precisely  equal  to  the  integral 
of  h.  By  taking  h’  as  Lhe  integrand  we  can  largely  eliminate  the  extremal 
behavior  near  r^-0  and  r^*l,  since  the  two  terms  in  brackets  will  tend  to 
cancel  each  other  whenever  r^  is  near  0 or  1. 
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We  now  incorporate  this  importance  sampling  procedure  for  r^  into 
our  Monte  Carlo  computer  program,  so  tnat  the  calculation  will  proceed 
via  (4.54)  instead  of  (4.41).  This  wilJ  require  essentially  two  modifications 
1°  Instead  of  picking  r^  as  a random  number  from  the  uniform 
distribution  in  the  unit  interval,  we  must  pick  rj  as  a 
random  number  from  the  uniform  distribution  in  the  unit 


interval  and  then  obtain  r ^ by  inverting  (4.53).  [In 

other  words,  r^  is  now  to  be  picked  as  a random  number 

from  the  distribution  defined  by  the  density  function 

2°  Instead  of  taking  the  integrand  to  be  h(ri,...,r  ),  we  take 

the  integrand  to  be  h(ri,...,r  )/P.(rt). 

n J J 

In  carrying  out  these  two  steps  it  is  usually  convenient  to  employ  a 
computer  subroutine  which  takes  rj  as  input  and  which  calculates  and  out- 
puts r^  and  Pj(r^).  The  discussion  in  Appendix  H of  several  explicit 
importance  sampling  density  functions  is  given  with  these  subroutine  re- 
quirements in  mind. 


It  should  be  clear  that  we  can  carry  out  this  single  variable  importance 
sampling  procedure  simultaneously  for  as  many  of  the  variables  ri,r2>...,rn 
as  might  seem  to  require  it.  In  general  we  end  up  with  an  expression  of 
the  form  (4.54)  with  all  the  appropriate  differentials  primed  and  with 
the  given  integrand  divided  by  the  product  of  all  the  importance  sampling 
density  functions  being  used.  We  then  modify  the  original  Monte  Carlo 
computer  urogram  according  to  steps  1°  and  2**  above  for  each  variable 
being  importance  sampled,  and  we  make  Preliminary  Run  // 2,  using  the  same 
number  of  random  points  as  was  used  in  Preliminary  Run  #1.  By  comparing 
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the  uncertainty  obtained  in  the  second  run  with  that  obtained  in  the 
first.,  we  can  directly  assess  the  effect  of  the  importance  sampling. 
Further,  by  examining  the  coordinates  of  the  new  extremal  integrand  values, 
we  can  again  investigate  whether  these  values  seem  to  be  associated  with 
a small  range  of  any  of  the  r^  coordinates.  We  may  find  that  one  or  more 
variables  not  being  importance  sampled  now  seem  to  require  it,  and/or  we 
may  find  that  variables  which  are  being  Importance  sampled  require  some 
adjustment  in  the  form  of  the  density  function  being  used.  The  latter  is 
a particularly  frequent  finding,  and  is  usually  handled  most  expeditiously 
by  adjusting  the  value  of  some  parameter  which  controls  the  amount  of 
density  function  peaking.  For  example,  for  an  integrand  which  assumes 
extremal  values  near  r^=0,  one  might  try  [cf.  (H .4) ] 


w 


r>o 


the  peaking  of  which  around  r^=0  is  roughly  proportional  to  T.  The  idea 
is  to  take  V large  enough  so  that  low  revalues  contribute  less  to  the 
variance  of  the  integrand;  however,  if  T is  taken  too  large,  then  one 
will  find  high  revalues  contributing  extremal  integrand  values,  thereby 
increasing  the  variance.  Clearly  some  experimentation  will  be  required  to 
discover  a reasonably  optimal  value  for  T . 


Generally  speaking,  it  will  take  several  "preliminary  runs"  to 
settle  on  a good  set  of  importance  sampling  density  functions.  One  must 
take  care  not  to  make  so  many  preliminary  runs  that  one  uses  as  much  or 
more  time  than  the  final  importance  sampling  scheme  will  save.  Usually 
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3 to  10  preliminary  runs  should  enable  one  to  do  about  all  that  can  be 
done  by  this  method.  Then,  of  course,  one  makes  a final  long  run,  or  as 
recommended  in  Sec.  3-3  four  final  long  runs,  using  as  high  a value  for  N 
as  time  and  money  will  permit  to  obtain  the  final  Monte  Carlo  estimate  of 
the  integral.  To  save  computer  storage  space  and  execution  time,  one 
can  remove  the  extremal  values  bookkeeping  machinery  from  the  program  be- 
fore making  the  final  calculational  run(s). 

In  striving  to  r*  ice  the  uncertainty  by  the  foregoing  importance 
sampling  procedure,  one  should  not  expect  to  achieve  the  minimum  uncertainty 
allowed  by  the  general  theory  of  importance  sampling,  which  was  discussed 
in  tl.**.  preceding  section  [see  (4.38)].  The  reason  is  that  this  simple  pro- 
cedure cannot  effect  extremal  behavior  caused  by  correlations  between  two 
or  more  variables.  One  way  to  see  this  is  to  observe  that  the  most  general 
importance  sampling  density  function  for  (4.41)  which  can  be  realized  by  the 
simple  method  described  here  has  the  form 

P(ri,...,r  ) • Pi (ri)P2(r2)*  • *P  (r  ) (4.55a) 

n n n 

However,  as  we  saw  in  the  last  section  [see  (4.37)  and  also  Appendix  G] , 
the  importance  sampling  density  function  which  minimizes  the  Monte  Carlo 
uncertainty  for  (4.41)  has  the  form 

Pmin^ri,''',rn^  = COnSt  X lh(r * * • • • »rn) I (4.55b) 

That  P(ri,...,r  ) in  (4.55a)  is  not  generally  capable  of  representing 
n 

Pmin^ri * • • • »rn)  in  (4.55b)  is  obvious. 
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Despite  the  fact  that  this  simple  one-variable  importance  sampling 
procedure  can  achieve  at  best  only  a partial  reduction  in  the  uncertainty, 
this  is  often  sufficient,  and  is  usually  better  than  nothing  at  all.  To 
improve  upon  this  method  one  would  have  to  search  for  correlations  involving 
two  or  more  variables,  but  this  gets  rather  involved.  For  example,  in 
order  to  conduct  a systematic  search  for  two-variable  correlations,  that  is 
for  integrand  extremums  caused  by  two  variables  r^  and  r^  together  satisfying 
some  condition,  one  would  have  to  make  n(n-l)/2  two-dimensional  scatter 
plots  of  the  extremal  coordinates,  one  plot  for  each  distinct  r^  pair. 

This  is  obviously  much  more  complicated  than  making  n one-dimensional  histo- 
grams of  the  extremal  coordinates,  which  is  essentially  what  we  do  in  the 
one-variable  importance  sampling  procedure.  A correlation  between  r^  and 
r^  causing  extremal  integrand  values  will  show  up  as  a clustering  of  the 
extremal  coordinate  points  along  some  narrow  band  in  the  r^r^  scatter  plot, 
just  as  a clustering  in  some  narrow  interval  of  the  r^  histogram  would  indi- 
cate an  extremal  producing  region  of  the  r^axis  in  the  one-variable  approach. 
If  such  a band  is  discovered  in  the  r^rj  scatter  plot,  one  can  try  to  find 
some  two-variable  probability  density  function  P^Or^r^)  which  is  peaked 
along  this  band;  one  then  generates  r^  and  r^  randomly  according  to  P^(r^,r^) 
instead  of  uniformly,  and  divides  the  integrand  by  P^^(r^,r^),  in  exact 
analogy  with  the  one-variable  procedure.  Alternatively,  one  might  try  to 
find  some  transformation  of  variables  (r^,r^. )^(p^,pj ) which  transforms 
the  extremal-producing  correlation  into  one  involving  only,  and  one 
could  then  apply  the  one-variable  importance  sampling  procedure  to  p^. 

As  a simple  example  of  a two-variable  correlation,  suppose  it  is 
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found  that  the  integrand  h(n,...,rn>  assumes  extremal  values  whenever 
r 1-1*2.  This  would  show  up  as  a clustering  of  points  around  the  line 
ri^r*  in  the  rir2  scatter  plot  of  the  extremal  value  coordinates;  in  fact, 
if  one  were  sufficiently  observant  one  could  probably  spot  this  particular 
correlation  in  the  coordinate  listings  of  the  one-variable  importance 
sampling  procedure.  One  way  to  deal  with  this  correlation  would  be  to 
construct  a probability  density  function  ?(rj,r2)  which  is  normalized 
on  and  non-zero  in  the  rir2  unit  square,  and  which  Is  peaked  along  the  line 
rij*r2j  one  then  would  generate  ri  and  r2  randomly  according  to  P(ri,r2), 
instead  of  uniformlv,  and  divide  the  integrand  by  P(ri,r2).  However,  a 
simpler  method  in  this  case  might  be  to  introduce  the  change  of  variables 
(ri,r2)+(pi,p2)  defined  by 


Pi  - | + |(ri-r2) 
P2  - ^(ri+r2) 


(4.56) 


It  is  easily  verified  that  this  transformation  carries  the  nr2  unit 
square  into  the  P1P2  unit  square,  and  that  9(pi ,p2)/9(ri  ,r2)  = 1/2; 
hence  we  can  simply  replace  dridr2  in  the  integral  (4.41)  by  2dpidp2. 

We  then  proceed  to  generate  pi  and  P2  uniformly  in  the  unit  interval, 
with  ri  and  r2  determined  by  inverting  the  above  formulae.  The  point 
here  is  that  the  ri=r2  correlation  will  now  cppear  as  an  extremal  condition 
associated  with  p 1 —1/2  independently  of  P2 ; this  can  be  easily  handled 
by  applying  single-variable  importance  sampling  to  Pi. 

Two-variable  importance  sampling  is  obviously  a much  mors  complicated 
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and  demanding  enterprise  than  one-variable  importance  sampling,  but 
it  is  nevertheless  quite  feasible.  It  does  not,  however,  appear  to 
be  feasible  at  this  time  to  attempt  an  analogous  systematic  search  for 
and  treatment  of  correlations  involving  more  than  two  variables.  Indeed, 
if  one-variable  importance  sampling  proves  inadequate,  then  even  before 
attempting  two-variable  importance  sampling  one  probably  should  investigate 
the  possibility  of  using  one  of  the  other  three  variance  reducing  techniques, 
perhaps  in  conjunction  with  one-variable  importance  sampling. 


145 


Appendix  A 

PROOF  OF  THE  REJECTION  METHOD  FOR  GENERATING  RANDOM  NUMBERS 


Suppose  we  are  given  a set  of  random  numbers  {xj}  distributed  according 
to  the  density  function  Pi(x),  and  also  a set  of  random  numbers  {r^}  dis- 
tributed uniformly  in  the  unit  interval.  Let  P2(x)  be  any  non-negative  in- 
tegrable  (but  not  necessarily  normalized)  function,  which  is  bounded  by 
the  finite  number  B2;  more  specifically,  we  require  0<P2(x)^B2  everywhere 
that  Pi(x)  is  non-vnnishing . Suppose  we  now  construct  a subset  {x^}  of 
the  set  {xj}  by  the  following  procedure:  Draw  a random  pair  xj  and  r^, 
and  take  xj  to  be  a member  of  {x^}  if  and  only  if 

P2(x»)/B2  * rt  (A.l) 


This  process  is  repeated  over  and  over,  using  a new  pair  x^  and  r^  each 
time,  with  x^  being  made  an  element  of  {x^}  whenever  (A.l)  is  found  to  be 
satisfied.  We  shall  now  prove  that  the  set  {x^}  constructed  in  this  way 
is  a set  of  random  numbers  distributed  according  to  the  density  function 
Pi  (x) «P2(x) 


P(x)  = 


(A. 2) 


/ Pi (x’) *P  2 (x ')dxf 

— 00 

and  moreover  that  the  efficiency  E of  this  generating  process — i.e., 
the  probability  that  an  arbitrarily  chosen  element  of  the  set  {x^}  wi]  * 
be  taken  to  be  an  element  of  the  set  {x^}  — is 


00 

E = | JPi(x')*P2(x')dx' 


(A.  3) 
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Let  Po(x)dx  be  the  probability  that  a random  draw  from  the  set 
{xp , accompanied  by  a random  draw  from  {r^},  will  produce  an  element 
of  the  set  {x^}  which  lies  between  x and  x+dx.  Po(x)dx  can  be  expressed 
in  two  different  ways.  From  one  point  of  view  we  can  write  Po(x)dx  as 
the  product  of:  (i)  the  probability  that  a randomly  chosen  element  x| 

will  become  a member  of  the  set  {x^},  times  (ii)  the  probability  that 
a member  of  the  set  {x^}  will  lie  between  x and  x+dx.  By  definition, 
the  probability  (1_)  is  E and  the  probability  (ii)  is  P(x)dx;  hence, 

Po (x)dx  = [E]x [P(x)dx]  (A. 4) 

From  another  point  of  view,  we  can  write  Po(x)dx  as  the  product  of; 

(iii)  the  probability  that  a randomly  chosen  element  x*  will  lie  between 

x and  x+dx,  times  (iv)  the  probability  that  an  xSvalue  which  lies  between 
x and  x+dx  will  be  accepted  as  a member  of  the  set  {x^} . By  definition 
the  probability  (iii)  is  Pi(x)dx.  To  find  an  expression  for  the  probability 

(iv)  , we  note  that  it  is  just  the  probability  that  an  xj-value  which  lies 
between  x and  x+dx  will  satisfy  the  acceptance  criterion  (A.l);  in  other 
words,  (iv)  is  the  probability  that  a randomly  chosen  element  r^  will 

be  less  than  the  quantity  P2(x)/B2  (which  by  hypothesis  lies  between 
zero  and  unity).  This  probability  is  precisely  P:(x)/B2,  since  the  proba- 
bility for  an  element  from  {r^}  to  be  less  than  r,  for  0<r^l,  is  just 
F(r)=r  [cf.  (2.8b)],  Hence,  our  second  expression  for  Po(x)dx  is 

Po(x)dx  = [Pi (x)dx]  x [P2(x)/B2]  (A. 5) 


147 


Now,  since  P(x)  is  by  definition  a properly  normalized  density 


function,  we  have  from  (A. 4) 

OO  00 

/Po(x)dx  * Ejp(x)dx  = E 
- 00  -00 

But  from  (A. 5)  we  also  have 

00  00 

Jp0(x)dx  = /P i (x) *P2 (x)dx 

— oo  * — oo 


Combining  the  last  two  equations  yields  at  once  the  expression  for  E 
in  (A. 3).  Now,  (A. 4)  implies  that 


Evaluating  the  numerator  from  (A. 5)  and  the  denominator  from  (A. 3)  (which 
has  just  been  established)  gives 


P(x) 


(l/B2)Pi(x)»P2(x) 
(l/B2) Jp i (x ’ ) (x ' )dx ' 


thus  establishing  (A. 2).  QED. 

The  "rejection  method"  presented  in  Sec.  2-3  is  now  obtained  as  a 
special  case  of  the  above  procedure:  For  if  we  take  {x^}  to  be  uniformly 

distributed  over  the  finite  interval  a^x^b,  so  that 


Pi(x) 


1/ (b-a) , 
0 


for  a$x<b 
otherwise 


(A. 6) 


then  according  to  (A. 2) 


the  set  {x^}  will  be  distributed  according  to 
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the  density  function 


P(x) 


b 

P2<x)//P2(x')dx\ 

a 

0 


for  a*x$b 
otherwise 


and  according  to  (A. 3)  the  generating  efficiency  will  be 


(A. 7) 


b 

/P2(x’)dx’ 

a 

B2 (b-a) 


(A.  8) 


Thus,  the  set  {x^}  is  distributed  over  the  interval  a£x<b  according  to 
C*P2(x),  where  C is  a normalization  constant,  and  the  fraction  of  the 
x^*s  which  are  accepted  as  elements  of  the  set  {x^  is  just  the  ratio 
of  the  area  under  P2(x)  between  a and  b,  to  the  area  under  the  enclosing 
box  of  height  B2  between  a and  b. 


149 


Appendix  B 


THE  JACOBIAN 

Consider  the  transformation  T from  xyz-space  to  uvw-space,  defined  by 


T: 


u * U(x,y,z) 
v = V (x,y,z) 
w = W(x,y,z) 


(B.l) 


We  assume  that  the  inverse  transformation  T*1  exists,  so  that  the  equations 
in  (B.l)  can  in  principle  be  "solved”  for  x,y  and  z in  terms  of  u,v  and  w: 


x = X(u,v,w) 
y = Y(u,v,w) 
z = Z(u,v,w) 


(B.2) 


Let  x,y,z  and  u,v,w  be  the  orthogonal  unit  vectors  in  the  two  spaces  (see 

Fig.  10).  Let  P=(x,y,z)  be  any  point  in  xyz-space,  and  let  P’^UjVjw)  be 

the  image  of  P under  T in  uvw-space.  Let  dx  be  the  differential  (cubic) 

volume  element  in  xyz-space  built  upon  the  three  vectors  xdx,ydy,zdz 

emanating  from  P,  and  let  dx ' be  the  image  of  dx  under  T in  uvw-space. 

We  wish  to  find  out  how  the  vclume  dl1  compares  with  the  volume  dx(=dxdydz). 

For  this  we  must  first  find  the  images  °f  the  respective  vectors 

xdx,ydy,zdz  under  T;  we  can  then  calculate  dx*  as  the  volume  of  the 

parallelipiped  built  upon  t ,1?  and  . 

x y z 

Let  Q be  the  point 

Q * (x+dx,y,z)  = (x,y,z)  + (dx,0,0)  = P + xdx. 
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x> 


FIGURE  10.  Deformation  of  the  differential  cubic  volume 


element  dxdydz  under  a hypothetical  transforma- 
tion T from  xyz-space  to  uvw-space. 
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The  Image  of  Q under  T is  evidently 


n»  / .3u,  ,3v,  3w,  N 

Q ' u+3xdx’  v+^dx’  w*§5dx) 

/ \ . /3u,  3v,  3w,  N 

- (u.v.w)  + (jjdx.^dx.^dx) 

= P + [U§idx  + Vj^dx  + w3-dx] 


But  since  Q*  = P*  + (see  Fig.  10),  then  we  may  conclude  that  the 
image  £ of  xdx  under  T is 


e 

x 


a3u,  , a3v,  , a3w, 
u^-dx  + vr~dx  + w^-dx 
dx  3x  3x 


(B.3a) 


It  is  of  course  understood  that  all  the  partial  derivatives  here  are 
evaluated  via  (B.l)  at  the  point  P.  In  the  same  way  we  find 


a3u,  , a3v,  . a3w, 
Vy  + v3ydy  + W3ydy 


(B.3b) 


and 


e 

z 


a3u,  . a3v,  . a3w, 
u^r-dz  + VT-dz  + wr-dz 
dz  dz  dz 


(B.3c) 


Now,  the  volume  of  the  parallelepiped  built  upon  any  three  vectors 
emanating  from  the  same  point  is  just  the  absolute  value  of  the  so-called 
'‘triple  scalar  product"  of  these  vectors.  Thus,  we  calculate  di ' as 


di'  = | e • (e  xe  ) | 
1 x y z 1 


(B.4) 
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The  triple  scalar  product  can  be  written  in  terms  of  the  orthogonal 
components  of  the  three  vectors  in  the  form  of  a determinant: 


e *(e  xe  ) 
x y z 


(?  ) (e  ) (? ) 

XU  XV  X w 

(?)  (?  ) (? ) 

y u y v y w 

(?  ) (?)  (?) 

z'u  Z V z w 


Inserting  the  specific  components  from  (B.3)  we  find 


s:(v?z)  = ffe?Sdxdydz 


(B.5) 


where  we  have  defined  the  Jacobian  of  the  transformation  (B.l)  by 


a(u,v,w) 

3(x,y,z) 


9u  9v  3w 
3x  9x  dx 

3u  3v  3w 
3y  3y  3y 

3u  3v  3w 
3z  3z  3z 


(B.6) 


Again  we  note  that  all  the  partial  derivatives  here  are  evaluated  via 
(B.l)  at  the  point  P.  Putting  (B.5)  into  (B.4),  and  noting  that  dxdydz*dT, 
we  conclude  that 


dx* 


3(u>v»w)  dT 
3(x,y,z) 


(B.7) 


According  to  (B.7),  the  absolute  value  of  the  Jacobian  (B.6)  gives  the 
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local  volume  expansion  dT*/dT  produced  by  the  transformation  T from 


xyz-space  to  uvw-space.  By  the  same  token,  we  may  assert  that  th 
absolute  value  of 


3x 

*L 

3u 

3u 

3u 

8(*>y.z>  = 

3x 

3Z 

3z 

3(u,v,w) 

3v 

3v 

3v 

3x 

h. 

3z 

3w 

3w 

3w 

in  which  all  of  the  partial  derivatives  are  evaluated  via  (B.2)  at  the 
point  P',  gives  the  local  volume  expansion  accompanying  the  inverse 
transformation  T ^ . Since  the  net  local  expansion  involved  in  the  successive 
transformations  (x,y,z)_T>(u,v,w)_T^(x,y,z)  must  obviously  be  unity, 
we  have 


3(x,y,z)| 


hi)  „ 

3(u,v,w) 


1 


whence. 


|3(x,y,z)  I . 1/|3(u,v,w)| 

1 3(u,v,w)  I y |9(x,y,z)| 


(B.8) 


Suppose  now  we  have  an  integral  of  the  form 


I * jj/f (u,v,w)dudvdw 


where  f is  some  function  defined  in  some  region  R'  of  uvw-space. 
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Heuristically , this  integral  can  be  thought  of  in  terms  of  a partition 
of  uvw-space  into  infinitesimal  volume  elements,  with  the  value  of  the 
integral  being  the  number  obtained  by  first  multiplying  the  value  of  f 
in  each  infinitesimal  volume  element  by  the  size  of  that  volume  element, 
and  then  summing  over  all  elements  inside  R*.  Now,  if  R*  is  the  image 
under  T of  a region  R in  xyz -space,  or  equivalently  if  R is  the  image 
of  R*  under  T \ then  one  way  of  partitioning  Rf  would  be  to  proceed 
as  follows:  Partition  R into  infinitesimal  volume  elements  dxdydz=di, 

and  then  take  the  volume  elements  in  R*  to  be  the  corresponding  images 
dtf  of  the  elements  dT  under  T.  Then  we  would  have 


JJff  <'a,v>w)dudvdw  = JJJf  (u,v,w)dx  * 

V R’ 

where,  on  the  right,  u,v  and  w denote  the  location  of  di 1 . Thus,  using 
(B.7)  and  the  fact  that  dx=dxdydz,  we  conclude 


J J Jf (u ,v ,w)dudvdw 

Rf 


JJJf (U(x,y ,z) ,V(x,y  »z) ,W(x,y ,z) ) 

R 


3(u,v,w) 

3(x,y,z) 


dxdydz 


(B.9) 


This  equation  shows  how  the  integral  on  the  left  "transforms"  under  the 
transformation  T ^ from  uvw-space  to  xyz-space;  it  is  the  rule  for 
"changing  integration  variables"  in  a multi -dimensional  integral. 

It  should  be  apparent  from  the  above  results. (B.7)  and  (B.9),  that 
the  Jacobian  3 (u,v,w)/3 (x,y,z)  of  the  transformation  T in  (B.l)  is  the 
three-dimensional  analogue  of  the  derivative  du/dx  of  the  one-dimensional 
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transformation  u-U(x).  Indeed,  the  definition  (B.6)  shows  that 

3 (u ,v,w) /3 (x,y ,z)  automatically  reduces  to  du/dx  in  the  one-dimensional 
case. 
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Appendix  C 


ADDING  INDEPENDENT  RANDOM  NUMBERS 

Let  {z^  and  (zj  be  two  sets  of  random  real  numbers  with 
density  functions  P^(z)  and  P£(z)  respectively.  The  mean  m^  and  variance 
of  the  set  {z^  (k=l,2)  are  then  given  by  [cf.  (3.9)  and  (3.10)] 


\ = Jz  V2)  dz 

ak  * /(z  “ m^)2Pk(z)dz  =/z2Pk(z)dz  - m£ 


(C.l) 


Suppose  we  construct  a new  set  of  random  numbers  {Z^}  by  drawing  a random 
number  from  each  of  the  two  given  sets  and  forming  their  sum, 


Z. 

i 


z1  . + z0  . 
l,i  2,i 


(C  .2) 


Assuming  that  the  draws  from  the  two  sets  are  statistically  independent, 

in  that  the  probability  for  obtaining  any  value  for  z9  depends  only 

on  P^  and  not  on  the  value  obtained  for  z^  then  the  density  function 

P(Z)  of  the  new  set  {Z^}  is  determined  by  the  following  statement:  The 

probability  for  Z^  to  fall  in  the  interval  dZ  about  Z is  equal  to  the 

product  of  (the  probability  for  z,  j to  fall  in  the  interval  dz,  about 

'1,1  l 

z^}  times  (the  probability  for  z 2 ^ to  fall  in  the  interval  d(Z-z^)  about 
(Z-z^)r,  summed  over  all  values  of  z^.  In  mathematical  terms  we  therefore 
have 
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P(Z)dZ 


* / Pi (zi)dzi *P2(Z-Zi)d(Z-zi) 
zi 

= /dziPi(zi)P2(Z-zi)dZ 


whence 


P(Z)  = /dZ!Pl(zi)P2(Z-Zl)  (C.3) 

We  may  thus  compute  the  mean  M and  variance  E2  of  the  set  { } as  follows: 

M-  /zP(Z)dZ  = JdZ[Z-zi+Zi]/dziPi(zi)P2(Z-zi) 

- Jd(Z-zi) Jdzi [ (Z-zi )+Zi ]Pi (zi)P2 (Z-zi) 

M = /dz2/dzitz2+zi]Pi(zi)P2(z2)  (C.4) 

I2  = Jz2P(Z)dZ  - M2  = /dZ[Z-zi+Zi]2/dziPi(zi)P2(Z-zi)  - M2 

= /d(Z-zi)/dzi[(Z-zi)+Zi]2Pi(zi)P2(Z-zi)  - M2 

I2  ■ /dz2/dzi [z2+zi ] 2P) (zi )P2 (z2)  - M2  (C.5) 

Recognizing  that  the  z\  and  Z2  integrations  in  (C.4)  and  (C.5)  are 
independent  of  each  other,  and  moreover  that  JdzjPi (zi)=/dz2P2(z2)  " 1, 
it  is  a simple  matter  to  carry  out  these  integrations  using  the  definitions 
in  (C.l).  The  results  are 

M = mi  + m2  (C.6) 

I2  - of  + a2  (C .7) 
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Thus,  provided  the  elements  ^ and  ^ are  drawn  independently  of 
one  another,  the  mean  and  variance  of  the  set  +z9  .}  will  be 

the  sums  of  the  respective  means  and  variances  of  the  sets  {z^  and 
{ z 2 Notice  in  particular  that  it  is  the  variances  a£,  not  the 

standard  deviations  a^,  that  add;  this  has  the  consequence  that  Z is 
always  somewhat  less  than  a;-K?2. 

By  adding  elements  from  a third  set  {z~  .}  to  {Z.},  we  see  that 

j , l i 

the  set  of  summed  elements  will  have  mean  (mi+m2 )+m3  and  variance 

(ai+al)+C3.  In  general,  if  each  element  of  the  set  { } is  obtained 

by  summing  N independently  drawn  elements  from  each  of  the  N sets  {z.  .}, 

x,i 

{z9  {z  },  then  the  mean  M and  variance  Z2  of  the  set  { Z . } 

Z , i N , I l 

will  be 


M = mi  + m2  +. . .+ 


(C.8) 


Z2  = a2  + a2  + . . .+  a2 


(C  .9) 


where  m^  and  a2  are  the  mean  and  variance  respectively  of  the  set  {z^  .. } 

In  particular,  suppose  that  the  N sets  of  random  numbers  {z^  ^} , 

{ z {z  } all  have  the  same  density  function,  or  in  other  words, 

Z , i N»  i 

suppose  they  are  all  the  same  set  {z^}  with  mean  m and  variance  a2: 


In  this  case,  each  element  of  the  set  {Z^}  can  be  regarded  as  the  sum 
of  N independently  drawn  elements  from  the  same  set  of  random  numbers 
{z^}.  According  to  (C.8)  and  (C.9),  the  mean  M and  variance  £2  of  the 
set  {Z^}  will  then  be  given  by 


M = m + m +. . .+  m = Nm  (C.ll) 

£2  = a2  + a2  +...+  a2  = N a2  (C.12) 

Define  now  the  new  set  of  random  numbers  {Z^}  by  the  rule 

Z±  = Zt/N  (C.13) 

It  is  easy  to  show  that  the  mean  M and  rms  deviation  £ of  the  set 
iZ^}  is  just  going  to  be  1/N  times  the  mean  M and  rms  deviation  E, 
respectively,  of  the  set  {Z±} . Thus,  in  view  of  (C.ll)  and  (C.12)  we  get 

M = M/N  = (Nm) /N  = m (C.14) 


and 

Z - r/N  - (Na2) ^2/N  = a/N1/2  (c.15) 

rJ 

We  can  interpret  these  results  in  the  following  way:  If  we  let  Z be 

the  "average"  of  N independently  drawn  elements  from  the  set  {z^}, 

Z * (zj  + z 2 +...+  zn)/N,  (C .16) 
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then  Z can  be  regarded  as  an  element  of  a set  of  random  numbers  whose 
mean  is  equal  to  the  mean  m of  the  set  (z^,  and  whose  rms  deviation  is 
equal  to  the  rms  deviation  0 of  the  set  { z divided  by  /N.  Thus,  we 
have  proved  a ’’weak  form"  of  the  Central  Limit  Theorem,  which  is  discussed 
in  the  text  in  connection  with  equations  (3.11)  and  (3.12).  The  Central 
Limit  Theorem  further  asserts  that  Iji  the  limit  of  large  N the  set  of 
random  numbers  { Z becomes  a Gaussian  distribution;  in  that  case  we 
can  assign  the  numerical  confidence  limits  in  (3.12)  [cf.  also  the  discussion 
following  (3.19)],  which  evidently  permit  a much  more  quantitative  inter- 
pretation of  the  rms  deviation  o// N than  can  be  obtained  from  the  develop- 
ment given  here. 
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Appendix  D 


THE  COVARIANCE 

Let  f i (x)  and  f 2 (3f)  be  two  functions  defined  on  a region  ft,  and  let 
f(x)  denote  their  sum: 

f(x)  * f 1 (x)  + f 2 (x)  (D.l) 


The  variances  (or  mean  square  deviations)  of  these  functions  with  respect 
to  a uniform  distribution  in  ft  are  given  by 


O2  = var(f:Pn)  = (f2:Pn>  - (f^)2 


and 

a2  = var(fi:Pfi)  = <f 2 :Pfi>  - , 1-1,2  . 


(D.2) 


Here,  the  bracket  (h:P^)  is  defined  for  any  function  h(x)  by 

{htP^  = fh(x)P0(3f)d?  = |fl|-1/h(3f)d5f  (D.3) 

where  P^(x)  is  the  density  function  (3.4)  defining  the  set  of  random 
points  distributed  uniformly  over  ft. 

We  seek  a relation  between  the  variance  of  f (x)  and  the  variances 
of  f i (5t)  and  f2(x).  By  straightforward  calculation  utilizing  (D.2) 
and  (D.3)  we  have 

var(t  :Pq)  = <f2:Pfi>-<f:Pn>2 

= ((f?  + 2f  jf  2 f-  f?):Po>  - <(fi  + f2)  :P^>2 
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+ 2<f,f2:Pn>  + (f!:Pn>-  «fi=P^  + <f2;Pfi»2 


- -<f.=Pn>2  + <*!*$,>  - <^=Pn>2 

+ 2<f,f,:Pa>-2<f,:P^<f2:PQ) 

Defining,  then,  the  "covariance  of  fi(x)  and  f2(x)  with  respect  to  a 
uniform  distribution  inside  by 

cov(f1,f2:Pn)  = <fif2:Pn>  - <fi:Pj)<fa=Pn>  0>.4) 

we  have  the  result 

var(f:P^)  = var(fi:P^)  + var(f2:P^)  + 2ccv(f 1 ,f 2 :P^)  (D.5) 

In  fact,  it  is  easy  to  see  that  the  foregoing  arguments  admit  the  slightly 
more  general  result 

var (aif 1 + a2f2:P^)  = ajvar(fi:P^)  + a2var  (f  2 :P^,) 

+ 2aia2cov(f 1 ,f2 :Pfi)  (D.6) 


where  ai  and  a2  are  any  two  constants. 

According  to  (D.4),  the  covariance  of  f 1 (x)  and  f2(x)  is  just  the 
average  of  the  product  of  these  functions  minus  the  product  of  their 
averages.  Comparing  (D.4)  with  (D.2)  we  see  that  the  covariance  of  any 
function  h(x)  with  itself  is  just  its  variance: 
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(D.7) 


cov(h,h:P^)  = var(h:P^) 

The  variance  of  a function  is  never  negative,  as  can  be  seen  by  writing 
it  in  the  form  [cf.  (3.10)] 

var(h:V  * <<h  ‘ <h:Pn>)2:Pn> 

= |^|-1/(h(3f)  - <h:P0'))2dx  > 0 (D.8) 

n 

However,  the  covariance  of  two  different  functions  can  be  either  positive 
or  negative.  To  get  some  idea  of  the  limits  on  the  covariance,  we  intro- 
duce the  function 

f3(x)  E cov(f i ,f2 :P^)f 1 (x)  - cov(f i ,f i :P^)f2(x) 

By  using  only  tne  definitions  of  the  variance  and  covariance,  together 
with  their  implied  consequences  (D.7)  and  (D.8),  it  is  a straight- 
forward but  slightly  tedious  calculation  to  show  that  the  statement 
var(f3:P^)>0  implies  the  inequality 

|cov(f  i ,f  2 :P^)  | £ /var(f  i :P^)  */var  (f  2 :P^)'  = 0102  (D.9) 

where  0 ^ is  of  cou"se  the  rms  deviation  of  f^(x)  with  respect  to  a uniform 
distribution  inside  ft  [cf.  (D.2)]. 

Combining  (D,9)  and  (D.5)  yields  the  result 

|oi  - a2|  a < 0i  + a2  (D.10) 
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Thus,  the  rms  deviation  of  f 1 (5c)  +f  2 (3c)  assumes  its  maximum  value  of 
01+02  when  the  covariance  of  fi(x)  and  f2(x)  assumes  its  maximum  value 
of  +O1O2 • Similarly,  the  rms  deviation  of  fi(x)  and  f2(x)  assumes 
its  minimum  value  of  | ai — 02 | when  the  covariance  of  fi(x)  and  f2(x) 
assumes  its  minimum  value  of  -0i02.  If  the  covariance  of  f](x)  and  f2(x) 
happens  to  vanish,  then  the  rms  deviation  of  fi(x)+f2(x)  will  be  /71+O2 . 

It  is  sometimes  convenient  to  define  the  "correlation  coefficient" 
p of  f 1 (x)  and  f 2 (x)  by 

cov(fi ,f2:P^) 

P " /var(f  1 :P^)/var (f2:P^) 

The  inequality  (D.9)  implies  that 

-1  « p « +1 


(D.ll) 


(fc.12) 
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Appendix  E 


THE  VARIANCE  OF  A FUNCTION  OVER  A PARTITIONED  REGION 


Let  f(?)  be  a function  defined  in  a region  ft,  and  let  ft  be  partioned 
into  n subregions  fti , ft2»...»ftn*  We  define  to  be  the  ratio  of  the 
volume  of  ft^  to  the  volume  of  ft: 

otj  = |nj/|n|  (E.i) 


The  condition  that  the  union  of  the  n non -overlapping  subregions  be 
equal  to  ft  implies  that 


I a - 1 (E.2) 

J-l  J 


We  have  [cf.  (3.5)] 


|n|<f:P  > - Jf(2)d*  = l f f (x)dx  = l |n  |<f:Pn> 
ft  j=l  ft_.  j=l  J j 


or 

" j,  °3<f!Pn>  (E-3) 

3-1  J 

This  expresses  the  average  of  f (r)  over  ft  in  terms  of  the  averages  of 
f (x)  over  the  various  subregions  fti , ft2,...,  ft^.  What  we  would  like 
to  do  now  is  derive  an  analogous  expression  for  the  variance  of  f (x) 
over  ft. 

Replacing  f by  f2  in  (E.3),  we  have 


166 


“ j=l  J j 


Inserting  this  and  (E.3)  into 
have 


the  usual  expression  for  var(f:Pfj).  we 


var(fsP0)  = <f2=Po>  ' <f:PQ> 


fe 

b 

t 

%■ 
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Using  (E.2)  we  can  rewrite  the  second  term  in  the  following  way: 


I'M1  - ai><£:Po>2 

j 3 3 j 


2 


= II  :Pn  > 2 

j*c  3 k S!j 

■ II  «4“k<f  :Pq  > 2 + II  <yk(f  :Pq  > 2 

j<k  3 k "'j  j<k  3 K “k 

= H a a «f:P0>2  + <f:Pfi  >2) 
j<k  3 k “j  k 


Inserting  this  into  the  previous  expression  for  var(f:P^),  we  find 


var (f  :P0)  = £a  var(f:P0  ) + H a. a. 
“ j 3 “j  j<k  3 


x «f:P  >2  + <f:P  >2  -2<f:P  ><f:P  >) 
3 k j k 


or  equivalently 


var(£sV 


I a var(f:P0  ) 
j=l  3 “j 


n n 


+ I I «,ak  (<f  :Pfi  > 
j=lk»l  3 j 


• V)‘ 


(E.4) 


j<k 


The  result  (E.4)  expresses  var(f:P^)  as  a sum  of  two  terms:  the 

first  term  is  due  to  the  variations  in  f(£)  within  the  various  subregions, 
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and  the  second  term  is  due  to  the  variations  in  f (x)  among,  the  various 
sub-egions.  In  particular,  since  the  second  term  i,  never  negative  we 

have  the  corollary 

n (E  5 

var(f:P0)  * I a,var(f:Pn  ) 

Si  j'l  J i 


which  is  to  be  contrasted  with  the  result  (E.3), 


Appendix  F 

OPTIMUM  APPORTIONMENT  IN  STRATIFIED  SAMPIING 

In  the  "stratified  sampling"  procedure  discussed  in  Sec.  4-4, 
the  region  of  integration  ft  is  partitioned  into  n subregions  fti ,ft2 » • • • »^n» 
and  a separate  Monte  Carlo  integration  is  performed  in  each  subregion. 

The  square  of  the  uncertainty  associated  with  this  Monte  Carlo  procedure 
is  [cf.  (4.24)] 

A*2  = l 10/oJnT1  (f.i) 

j=l  3 ■* 


where  is  the  variance  of  the  integrand  with  respect  to  the  uniform 
distribution  of  points  inside  ft  ^ . We  wish  to  find  the  set  of  values 

+-i  /«* 

Ni,  N2»  ...»  N^  which  minimizes  (F.I),  subject  to  the  condition 


I N , = N (F  .2) 

j=l  J 


Suppose  that,  for  a given  set  of  N_.  values,  we  vary  each  by  a small 
amount  6N^ . These  variations  are  presumed  to  be  consistent  with  condition 
(F.2)  but  otherwise  quite  arbitrary;  in  other  words,  all  we  require  of 


the  small  variations  <$Ni,  6N2,...,  6N  is  that  they  be  such  that 

n 


6N  = 6 l N = 0 

j=l  J 


or  equivalently, 
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(F.3) 


I «N  > 0 
j=l  J 


Now  these  small  variations  in  the  N^'s  will  induce  a small  variation  in 
*2 

A in  the  amount 


n 


SA*2  - 


j=l  J J J 


" a*2 

oA  = 


' - no/”]"!2"! 


(P.4> 


In  particular,  if  the  variations  6N^  are  all  taken  about  the  minimizing 

aj  2 

values  Hy  then  the  variation  in  A will  evidently  vanish;  hence,  we 
have 


(F,5) 


Now,  the  only  way  for  (F.5)  to  hold  for  every  set  of  variations 

(6N i ,6N2 , . . . ,<5N  ) which  satisfies  (F.3)  is  for  the  quantity  multiplying 
n 

6Nj  in  (F.5)  to  be  a constant,  independent  of  j: 


lnjl2°j5j2  ' c2> 


(F.6) 
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:v4*i it 


For  then  and  only  the:,  will  (F.5)  be  sat.1<;f . . ... 

' • ->.»  De  satisfied  without  requiring  of 

the  variations  6Nj  anything  more  than 
We  therefore  have 


more  than  is  required  by  conditions  (F.3). 


sj  ■ C"1l«jkj 


(F.7) 


which  says  that  N.  is  proportional  to  the  product  of  the  volume  |fi. 
times  the  rms  deviation  ^ 

,he  c '*  “fiy  by  «.»,  „i„«.  j 

to  satisfy  (F.2):  ^ 


[ n = c 1 [ |n  |a  = N 

i-l  i=i  1 i 


(F.8) 


Therefore, 


(F.9) 


With  this  result  we  can  immediately  calculate 


the  minimum  value  of  A* 
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whence 


C„  = N"1/2  l In  Jo 


j-i 


j1  j 


(F.10) 
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Appendix  G 


OPTIMUM  DENSITY  FUNCTION  FOR  IMPORTANCE  SAMPLING 

The  importance  sampling  procedure  discussed  in  Sec.  4-5  is  based 
on  the  fact  that  the  Integral 

I = Jf(x)dx  (G.l) 

n 

can  be  regarded  as  the  mean  of  the  function  f(x)/P(x),  taken  with 
respect  to  the  set  of  random  points  {3?^}  distributed  over  Q according  to 
the  density  function  P(x): 

X = / [f (x)/P(x) ]P(x)dx  = <f/P:P>  (G.2) 

£2 

If  we  estimate  this  mean  by  averaging  a finite  number  N of  randomly 
chosen  elements  of  the  set  {f (x^) /P(x^) } , then  the  square  of  the  uncertainty 
in  our  estimate  will  be 

a*2  . <(f/P)2:P>  - <f/P:P>2 
N 


= N-1|j[f(?)/P(*)]2P(5T)d5>  - I2 

'n 


or 

A*2  = N_1{J  - I2)  (G.3) 


where  we  have  defined 


J = / f2(x)P_1(x)dx  (G.4) 

ft 
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We  now  ask  the  question,  what  density  function  P(x)  will  render 
*2 

A a minimum?  Since  the  quantities  f,  ft  and  N are  presumed  fixed,  then 
*2 

clearly  A will  be  a minimum  if  and  only  if  J is  a minimum. 

Let  Po(5)  be  the  minimizing  function;  that  is,  Po(x)  satisfies  the 
conditions 

Po(x)  0 for  xeft  (G . 5) 

and 

/Po(2)dx=l  (G.6) 

ft 

and  furthermore,  of  all  functions  P(x)  which  satisfy  these  conditions, 
Po(x)  causes  J in  (G.4)  to  assume  the  smallest  value.  Form  the  family 
of  functions  Pf(x)  according  to 

p£(x)  = P0(x)  + en(x)  (G.7) 

where  £.  is  a real  variable  (the  family  parameter),  and  where  r|(x)  is 
any  function  which  satisfies  the  condition 


/,l(x)dx  = 0 (G.8) 

ft 

With  ('G.M,  this  condition  on  n(x)  evidently  insures  that  each  function 
P£'x)  : r.  (G.7)  satisfies  the  requirement 

/p  (x)d5c  = 1 (G.9) 

ft  e 
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It  follows,  then,  that  the  quantity 


J(e)  = /f2(x)P!1(jf)dx  (G.10) 

a 

has  a minimum  at  £=0.  Differentiating  (G.10)  with  respect  to  e gives 
J'(e)  = T7/f2(x)P~1(x)dx 

don  e 

= /f2(^feP£1(?)dJ 

= /f2(x)[-P~2(x)  |^Pe(x)  ]dx 

ft 


or,  wich  (G.7) , 

j'(E)  = -/f2(3f)p;2rx)n(5f)d?  (g  .11) 

n 

The  fact  that  J(£)  has  a minimum  at  e=0  implies  that  Jf(0)=0;  hence, 
we  have 

/f2(x)Po2  (x)Tl(x)dx  = 0 (G.12) 

fi 

Now,  the  only  way  for  (G.12)  to  hold  for  every  function  n(x)  which 
satisfies  (G.8)  is  for  the  quantity  multiplying  n(x)  in  (G.12)  to  be 
constant  over  ft: 

f 2(x)Po2(x)  E C2,  all  XEfi  (G.13) 
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For  then  and  only  then  will  (G.12)  be  satisfied  without  requiring  of 


I'/. 

%: 


y'.. 


n(x)  anything  more  than  is  required  by  condition  (G.8).  Combining 
(G.13)  with  (G.5)  yields  the  result 

Po  (3c)  = C-1 1 f (x)  I , OO 

The  precise  value  of  C is  then  determined  through  requirement  (G.6 


1 = /P0(5c')dx'  = C 1/|f(3c')|d3c' 

a q 


Therefore,  the  density  function  Po(x)  which  minimizes  J,  and  hence 
is 


Mx)  = | f (x)  |//|  f (x')  |dx* 


With  this  result  we  can  easily  calculate  the  minimum  value  of  . 


J . 55  J (£=0)  = ff 2 (x)Fo^  (x)dx 

in  in  a 

io 


/ 1 f (5f 1 I dx f */f 2 (x)  |f(x)  I 


|f (xf) |dx’-/|f(x) |dx 

Q 


whence 


Jmin  = (/  lf®ld3?) 
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(G.14) 


(G.15) 


(G.16) 


(G.17) 


The  minimum  value  of  A 2 is  thus  [cf.  (G . 3) ] 


A*?  = N_1  J - I2 

min  min 


or 

A*2n  s N-1  | ^ / 1 f (x) - ^Jf (x)dx^2 J (G.18a) 

ft  ft 

Using  the  identity  a2-b2=(a-b)  (a+b) , this  result  can  also  be  written  in 
the  form 


A*?  = 4N-1J  |f(5t)|dS*J+|f(*)|d* 

min  n"  n 


(G.18b) 


where  ft  is  the  subregion  of  ft  in  which  f(3)<0  and  ft+  is  the  subregion 

of  ft  in  which  f(x)>0.  It  follows  from  t'G.18b)  that  A 2 will  vanish  if 

min  ^ 

and  only  if  f(£)  never  changes  sign  inside  ft  (in  which  case  either 
|ft  ! or  |ft+|  vanishes). 
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Appendix  H 

SOME  ONE-VARIABLE  IMPORTANCE  SAMPLING  DENSITY  FUNCTIONS 

The  basic  idea  behind  the  one-variable  importance  sampling  technique 
described  in  Sec.  4-6  may  be  briefly  stated  as  follows:  In  a Monte  Carlo 

calculation  of  the  integral 


! 

5-' 


I = 


1 1 

/dr i*  • -/dr  h(n,...,rn) 
o o 


(H.l) 


we  may  generate  any  particular  coordinate  r^  inside  the  unit  interval 
according  to  a non-uniform  density  function  P(r^),  provided  we  take 
h(n,...,rn)/P(r^)  as  the  quantity  to  be  averaged  instead  of  h(r  i , . . . ,rn>  . 
Doing  this  can  be  advantageous  if  h(n,...,r  ) happens  to  assume  extremal 
values  whenever  the  variable  r.  falls  inside  some  small  subinterval 


J 

(a^,8j)  of  the  unit  interval.  For  then,  by  choosing  P(r^)  to  large 
whenever  t*1*  denominator  in  the  quantity  being  averaged  will 

moderate  the  external  behavior  of  the  numerator  in  that  critical  r^  interval. 
As  a result,  the  variance  of  the  values  being  averaged  will  be  reduced, 
and  the  Monte  Carlo  uncertainty  will  be  made  smaller. 


In  this  appendix  we  shall  describe  several  simple  density  functions 
P(r)  which  can  be  used  for  one-variable  importance  sampling.  Generally, 
any  such  density  function  must  satisfy  the  requirements 


P(r)>0  for  O^ra 


(H.2a) 


}p(r)dr  - 1 

0 


(H.2b) 
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The  condition  (H.2a),  that  P(r)  not  vanish  inside  the  unit  interval, 
is  simply  to  insure  boundedness  of  h(ri , . . . ,rn>/P(r)  [r  stands  of  course 
for  one  of  the  r^  variables],  and  can  be  relaxed  at  any  point  where 
h+0  faster  than  P+0.  As  discussed  in  Sec.  4-6,  in  order  to  actually 
importance  sample  the  variable  r according  to  a given  density  function 
P(r),  it  is  most  convenient  to  have  a computer  subroutine  which  does  the 
following: 

1°  Accepts  as  input  a number  r*  in  the  unit  interval.  [Normally, 
rf  will  be  a given  random  number  from  the  uniform  distribu- 
tion in  the  unit  interval.] 

2°  Calculates  and  outputs  that  value  r which  satisfies 

r ' = F (r)  (H. 3) 

where  F is  the  distribution  function  corresponding  to  the 
density  function  P.  [This  number  r will  be  used  in 
evaluat ing  h (r i , . . . , r ) . ] 

3°  Calculates  and  outputs  the  value  P(r)  for  the  r value 
found  in  2°.  [This  number  P(r)  will  be  divided  into 
h(n,...,rn).] 

In  what  follows  we  shall  develop  equations  from  which  such  a computer 
subroutine  can  be  written  f^r  three  simple  functions  P(r)  that  the  author 
has  found  to  be  particularly  useful.  In  addition,  we  shall  see  how  one 
can  go  about  constructing  a one-variable  importance  sampling  subroutine 
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for  an  arbitrarily  shaped  density  function  P(r). 


► P(r)  = e"rr  (r>0) 

This  density  function  is  often  useful  in  cases  where  the  integrand 
h assumes  extremal  values  whenever  r is  near  0.  The  size  of  the 
parameter  V is  to  be  chosen,  normally  by  trial -and -error , to  be  com- 
mensurate with  the  degree  of  peaking  of  h near  r=0;  the  greater  this 
peaking  the  larger  T should  be.  The  normalization  condition  (H.2b)  allows 
us  to  determine  the  normalization  constant,  and  one  easily  finds  that 
the  correctly  normalized  density  function  is 

P(r) T—-  e'rr  (<Kr«l)  (H.4) 

1 - e"r 

The  calculation  of  the  corresponding  distribution  function  is  straight- 
forward, and  yields 

r . _ -Tr 

F(r)  S Jp(r')dr'  - (H.5) 

o 1 - e 

We  note  as  a check  that,  as  r increases  from  0 to  1,  F(r)  also  increases 
from  0 to  1.  Putting  (H.5)  into  (H.3)  and  solving  for  r yields 

r=-f?n[l-r,(l-e'r)]  (H.6) 

Thus,  given  any  r'  between  0 and  1 [item  1°],  we  calculate  r from 
(H.6)  [item  2°];  then,  using  this  value  of  r,  we  calculate  P(r)  from 
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(H.4)  [item  3° ] . 


P(r)  o'  e-r(1*r)  (r>0) 


This  density  function  is  often  useful  in  cases  where  the  integrand  h 
assumes  extremal  values  whenever  r is  near  1.  The  larger  these  extremal 
values  are,  ..he  greater  the  value  of  Y should  be.  Since  this  density 
function  is  just  the  mirror  image  of  (H.4)  about  the  line  r=l/2,  the 
normalization  constant  is  the  same: 


P(r)  = ^ e'r(1_r)  (0«rv<l) 


1 - e 


(H.7) 


The  calculation  of  the  corresponding  distribution  function  is  straight- 
forward, and  yields 


r Yr  - 1 

F(r)  = Jp(r  ')dr ' = ^ 

o e1  - 1 


(H.8) 


Putting  this  into  (H.3)  and  solving  for  r yields 


r = i£n[l  + r'(er-l)] 


(H.9) 


Thus,  given  any  r'  between  0 and  1 [item  1°J,  we  calculate  r from 
(H.9)  [item  2°];  then,  using  this  value  of  r,  we  calculate  ?(r)  from 
(H.7)  [item  3°]. 


?(r)  - 1 / [ (r  - r0)2  + T2]  (0*raa,  T>0) 
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‘'teptKi. 


f 


This  density  function  is  often  useful  in  cases  where  the  integrand 
is  peaked  whenever 


Max(0,ro-0  < r < Min(l,ro+0 


where  ro  is  any  point  in  the  unit  interval,  "he  greater  the  integrand 

if1' 

peaking  at  rq,  the  smaller  the  value  of  Y should  be  [in  contrast  to  the 

two  previous  density  functions].  The  normalization  constant  is  easily 

•f 

determined  by  requiring  P(r)  to  satisfy  (H.2b),  and  the  correctly  normalized 
density  function  is  found  to  be 


P(r)  ^ (ll-10) 


where  the  constants  A and  B are  defined  by 

A = arctanj^  y — | , B = arctan|^J  (H.ll) 


The  calculation  of  the  corresponding  dist  Ibution  function  yields 

F(r)  5 /P(r')dr ' = (^)^arctan[^~ -°]  + (H.12) 


Putting  this  into  (H.3)  and  solving  for  r yields 


r = r0  + Ttan[r’(A  + B)  - B] 


(H.13) 


Thus,  given  any  r'  between  0 and  1 [item  1°],  we  calculate  the  corresponding 
value  of  r from  (H.13)  [item  2°],  where  A and  B are  defined  in  ^H.ll) ; then, 
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using  this  value  of  r,  we  calculate  the  value  P(r)  from  (H. 10) [item  3°]. 

The  sharpness  of  the  peaking  of  each  of  the  foregoing  three  density 
functions  is  controlled  by  the  single  parameter  T,  which  parameter  can 
be  varied  in  the  "preliminary  runs"  to  determine  a more  or  less  optimal 
value  [see  Sec.  4-6].  The  shape  of  each  of  these  density  functions  is 
clearly  restricted  by  its  analytical  form;  however,  it  usually  turns  out 
that  one’s  knowledge  of  the  dependence  of  the  function  h(ri,...,rn)  on 
any  one  of  its  variables  is  so  meager  that  one  usually  cannot  take  advantage 
of  very  much  flexibility  in  the  shape  of  importance  sampling  density 
functions.  Nevertheless,  situations  do  occasionally  arise  in  which  one 
clearly  sees  the  need  to  importance  sample  some  variable  r according  to  a 
density  function  P(r)  of  a very  specific  shape.  If,  in  such  a case,  it 
appears  to  be  impractical  to  find  an  analytic  form  for  P(r)  which  is  simple 
enough  that  its  distribution  function  can  be  calculated  and  inverted,  then 
one  can  always  approximate  the  desired  P(r)  curve  as  closely  as  necessary 
by  a piecewise  linear  curve,  as  indicated  in  Fig.  11.  The  point  here  is 
that  it  is  fairly  easy  to  write  a computer  subroutine  which  will:  (i_)  accept 
the  "pivot  points"  (p^ta^)  on  input  data  cards;  (ii)  scale  the  ordinates 
01  so  that  the  total  area  under  the  piecewise  linear  curve  is  unity,  there- 
by rendering  the  piecewise  linear  curve  a properly  normalized  density 
function;  and  (iii)  calculate,  for  any  given  value  r'  between  0 and  1, 
that  value  r for  which  the  area  under  the  piecewise  linear  curve  between 
the  vertical  lines  through  0 and  r is  equal  to  r*.  This  last  step  is  of 
course  equivalent  to  inverting  the  distribution  function  corresponding  to 
the  piecewise  linear  density  function. 
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To  carry  out  item  (ii)  above,  one  need  only  recognize  that  the  "raw 
area"  under  the  trapezoid  between  r=p^  and  is  [see  Fig.  11] 


Ai  = (pi+x  - pi>-ai + r(pi+i-  pi)-(ai+i'  °i> 


(H.14) 


and  the  total  area  under  the  piecewise  linear  curve  can  be  made  unity  if 
each  ordinate  is  simply  divided  by  the  total  raw  area  E.^.  Having 
thus  normalized  the  piecewise  linear  curve,  one  can  carry  out  item  (iii) 
by  observing  that  the  area  under  the  curve  between  r=p^  and  r-p  (p^p^p^^) 
is  [cf  (H.14)] 


A - a(p1+r  pi),ai + ra(pi+r  pi),0l(ai+r  °i) 


where  a is  the  fractional  distance  of  p from  p.  to  p..,: 

1 l+l 


a e (p  - Pi)/(Pi+1-  Pt) 


From  these  two  equations  one  can  show  that  the  value  of  p which  corresponds 
to  a given  value  of  A (A^A^)  is 


Q±  + A/ai 


, if  a.,  = a. 

i+l  i 


pi  + K^Am.  - ai)/mi,  ifai+1^ai 


(H .15) 


where 


mi  - (ai+r  °i)/(pi+r  pi> 


(H.16) 


In  applying  these  equations  it  is  important  to  note  that  it  is  not 
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necessary  for  the  points  to  be  equally  spaced  along  the  r-axis; 
all  that  is  required  is  that  each  be  greater  than  p^,  an<*  eac^ 
be  positive. 

With  the.  ability  to  generate  random  numbers  in  the  unit  interval 
according  to  any  bounded,  piecewise  linear  density  function,  we  obviously 
have  great  flexibility  for  carrying  out  the  one-variable  importance 
sampling  procedure  described  in  Sec.  4-6.  However,  as  mentioned  pre- 
viously, one's  knowledge  of  the  behavior  of  the  integrand  h(r^,.«*,rn) 
as  a function  of  any  of  the  variables  r^  is  usually  so  limited  that  one 
usually  cannot  take  full  advantage  of  this  flexibility.  In  practice, 
therefore,  a simple  analytic  density  function,  like  one  of  the  three 
described  earlier,  usually  proves  adequate. 
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Appendix  I 

REMARKS  ON  THE  MARKOV  CHAIN  MONTE  URLO  METHOD 

In  the  field  of  statistical  mechanics  one  is  confronted  with  the 
task  of  calculating  macroscopic  properties  of  systems  composed  of  very 
many  identical  microscopic  components  which  interact  according  to  known 
(or  hypothesized)  laws.  Examples  are  the  calculation  of  the  equation  of 
state  of  a gas  composed  of  molecules  which  interact  via  a specified  inter- 
molecular  force,  and  the  calculation  of  the  magnetization  of  a lattice 
of  atoms  whose  magnetic  moments  interact  with  each  other  and  with  an 
external  magnetic  field  according  to  the  laws  of  electrodynamics.  Typi- 
cally, these  calculations  require  the  evaluation  of  one  or  more  n-dimensional 
integrals,  w1  ere  n is  of  the  order  of  the  number  of  microscopic  components 
(e.g.,  molecules  or  atoms)  in  the  system  under  study.  In  most  cases 
these  integrals  cannot  be  calculated  either  by  analytical  methods  or  by 
classical  numerical  methods;  in  fact,  it  usually  happens  that  even  the 
conventional  Monte  Carlo  method  breaks  down  for  these  problems.^  As  a 
consequence,  workers  in  this  field  usually  employ  another  Monte  Carlo  method, 
one  which  was  first  used  by  Metropolis  and  co-workers  in  1953  to  calculate 


^However,  in  Sec.  2-10  we  derive  a set  of  generating  formulae  [see  (2.94)] 
which  might  be  used  to  calculate  by  the  conventional  Monte  Carlo  procedure 
the  equilibrium  properties  of  a one-dimensional  gas  of  impenetrable 
molecules. 
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the  equation  of  state  of  a two-dimensional  gas  of  hard  disks  (Ref.  10). 

The  distinguishing  feature  of  this  Monte  Carlo  approach  is  that  it  utilizes 
the  mathematical  concept  of  a "Markov  chain”  or  a "Markovian  random  walk". 
It  is  not  our  purpose  in  this  report  to  lay  out  the  theory  of  this  special 
Monte  Carlo  technique;  however,  the  successes  of  the  Markov  chain  approach 
have  become  so  well  known  that  any  introduction  to  Monte  Carlo  methods 
would  be  incomplete  without  some  discussion  of  it.  Therefore,  in  this 
appendix  we  shall  give  a very  brief  description  of  the  Markov  chain  Monte 
Carlo  method;  for  more  explicit  discussions,  the  reader  should  consult 
Chapter  9 of  Hammersley  and  Handscomb  (Ref.  1),  and  the  extensive  review 
article  by  Wood  (Ref.  11)  and  references  contained  therein. 

The  general  problem  is  once  again  to  evaluate  an  integral  of  the  form 


I = / f(x)P(x)dx  (1.1) 

n 

where  P(x)  is  a probability  density  function  normalized  o.  :r  the  finite, 
n-dimensional  region  ft.  In  the  context  of  statistical  mechanical  appli- 
cations, the  multi -dimensional  variable  x usually  specifies  the  physical 
"state"  of  some  system  (e.g.,  the  spatial  coordinates  of  all  the  molecules 
in  a gas,  or  the  magnetic  moment  orientations  of  all  the  atoms  in  a 
magnetic  substance);  ft  is  the  set  of  all  physically  allowable  states; 
f (x)  denotes  the  value  of  some  dynamical  variable  f when  the  system  is 
in  state  x;  and  P(x)dx  denotes  the  probability  for  a randomly  chosen 
svstem  from  an  appropriate  "statistical  ensemble"  of  systems  to  be  in 
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some  state  in  the  infinitesimal  region  dx  around  x.  The  form  of  the 
function  P(x)  is  given  by  the  laws  of  statistical  mechanics,  and  is 
usually  taken  to  be  the  "microcanonical  ensemble”  probability  density 

P(x)  - e“D  (2)  /kT/je~U  (*  ’ ) /kTdx ' (tefl) 

where  U(£)  is  the  total  energy  of  the  system  in  the  state  x,  T the 
absolute  temperature,  and  k Boltzmann's  constant.  In  this  context, 
the  value  of  I in  (I . 1)  would  be  interpreted  physically  as  ”the  equili- 
brium value  of  f at  temperature  T”.  The  dimensionality  n of  ft  is 

usually  quite  large,  and  P(x)  is  usually  analytically  complicated  an. 

1* 

exceedingly  small  over  most  of  ft.  More  often  than  not,  these  conditions, 
combine  to  make  it  totally  impractical  to  generate  random  points  according 
to  P(x)  by  either  the  inversion  method  or  the  rejection  method;  as  a 
result,  the  Markov  chain  Monte  Carlo  method  frequently  offers  the  only 
hope  for  evaluating  I. 

In  the  Markov  chain  approaca,  it  is  convenient  to  regard  the 


+ 

‘With  reference  to  the  problem  of  the  ore -dimensional  gas  of  impenetrable 
rods  discussed  in  Sec.  2-10,  the  region  ft  considered  here  would  be  more 
properly  associated  with  the  region  E in  (2.95)  rather  than  the  region  ft 
in  (2.77).  The  complicated  nature  of  P(x)  in  that  case  arises  from  the  fact 
that  P(x)  vanishes  everywhere  inside  £ (2.95)  except  in  that  extremely 
small  and  oddly  shaped  subregion  ft  (2.77). 


190 


space  of  the  variable  it  as  being  discrete  rather  than  continuous, 
this  we  can  imagine  setting  up  in  the  space  of  x a very  fine  n-dia.ensional 
cubic  grid  or  mesh,  which  subdivides  SI  into  a total  of  B 'cells  of  equal 
si2e  a*.  We  number  these  cells  by  the  index  i in  any  convenient  fashion, 
and  we  let  ^ denote  the  center  of  the  ith  cell.  Thus,  whereas  originally 
the  system  could  be  in  a non— erably  infinite  number  of  states  xeB, 
we  now  suppose  that  the  system  can  only  be  in  one  of  the  B states 
xi,5r2,...,x‘B  in  Q.  It  is  of  no  concern  to  us,  either  theoretically  or 
practically,  how  extremely  large  B is,  Just  so  long  as  it  is  finite, 
how,  since  P(?)  in  (I.D  is  normalized  over  fl,  then  provided  the  mesh 
size  A*  is  taken  sufficiently  small  we  can  write  with  negligible  error 


IS° 


Jf(x)P(x)dx 


/P(x)dx 


l f(x  )P(x.)Ax 

1=1  

B 

l P(x,)Ax 
1=1 i. *  3 


Hence, 


i.  always  ndM  « * «««.!  “ _ “ 

e„  _U..C  «=>  the  computer  ».  handle  without  "u«d.rfl..ln>". 

thee  the  oomputer  will  uecearlly  =«<  *“  ““  “ 

e collection  of  -iT  di.cct.  point,  on  an  n-dl..n.lon.l  cubic  lettlc 
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(1.2) 


1 = E f(Si)iri 

i=l 


where 


(1.3) 


is  the  (correctly  normalized)  probability  associated  with  the  discre** 
state  *i. 

If  we  could  somehow  generate  N random  states  i?.  ,x^  , . . . ,x*.  according 

H h iN 

to  the  probabilities  rr^  in  (1.3),  then  we  could  approximate  I as  the  average 
of  f over  these  states, 


i = | l *<*i ) 

j=i  j 


(1.4) 


with  the  associated  uncertainty  being  determined  in  the  usual  way  from  the 
variance  of  the  values  being  averaged.  However,  by  hypothesis  it  is  not 
possible  to  generate  such  a set  of  random  states  by  any  of  the  conventional 
Monte  Carlo  methods. 

At  this  point  we  introduce  the  (seemingly  unrelated)  concept  of  a 
Markovian  random  walk  over  the  lattice  of  points  x^  inside  ft.  More 
specifically,  we  consider  a walk  over  these  discrete  states  which  is 
governed  solely  by  a set  of  one-step  probabilities  p^ (i, j~l, . . .B) , 
defined  by 


p^.  = the  probability  that,  if  the  walker  is  at  state  x^,  the 
next  step  will  carry  the  walker  to  state  yty  (1.5) 
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It  should  be  clearly  understood  that  such  a random  walk  over  the 
states  ?.  has  nothing  at  all  to  do  with  the  actual  time  behavior  of 
the  physical  system  under  study;  the  random  walk  is  merely  a mathematical 
artifice  which  we  are  introducing  in  order  to  effect  a calculation  of  the 
integral  I.  The  adjective  "Markovian"  simply  means  that  the  probability 
of  walking  from  x^  to  *x^  is  independent  of  the  past  history  of  the  walk 
(i.e.,  of  where  the  walker  was  before  coming  to  state  “x^) ; if  the  situation 
were  otherwise,  the  random  walk  would  be  "non-Mar kovian",  and  would  not 
be  describable  simply  by  the  B"  probabilities  in  (1.5)* 


Since  p^ 


is  a probability,  we  have 


0 $ p « 1 (i, j-.l ....  ,B)  (1.6) 

Furthermore,  since  the  walker  will  always  step  from  x^  to  some  x ^ , 
we  have 


B 

l P-h  * 1 (i=l,...,B)  (1.7) 

J-l  3 


The  probability  p^  that  the  walker  will  go  from  to  x^  in  two  steps 
is,  by  the  multiplication  and  addition  laws  for  probabilities. 


pij  ■ 


In  general  we  define  the  n-step  probabilities  p^  (i,j*l, . . ,,B)  by 
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(1.8) 


p(^  - the  probability  of  walking  from  state  x^  to 
,'tat.e  in  n steps. 


It  is  clear  that  , and  that  for  any  n£2 


Pij 


B B B 

l l •••! 

ki=l  k2=l  k =1 
n-i 


pikipkik2*  ’ ^k^j 


(1.9) 


If  we  regard  the  probabilities  p^  as  elements  of  a B*B  matrix  £, 

P11  p12  •••  P1B 
P21  p22  •••  P2B 

PE  ... 

... 

_PB1  PB2  PBB 

then  (1.9)  simply  says  that  the  B*B  matrix  ^ of  the  n-step  probabilities 
p^  is  obtained  by  multiplying  P by  itself  n times: 

jf'=En  (1. 11) 

Granted,  then,  that  we  can  conceive  of  a random  walk  over  the  discrete 
stales  in  ft,  the  walk  being  characterized  by  a one-step  probability 
mat  *ix  P,  what  does  this  have  to  do  with  estimating  the  quantity  I in 
(I.'!)?  The  answer  to  this  question  is  supplied  in  part  by  the  following 


(1. 10) 
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theorem  [a  rigorous  proof  of  which  may  be  found  in  Chapter  XV  of 
Ref.  8]: 


Theorem:  Suppose  that,  for  a given  set  of  state  probabilities 

tt^,  we  have  a set  of  one-step  probabilities  p^  which  satisfy 
the  following  two  conditions: 

A (Ergodic  Condition) . If  x^  and  x^  are  any  two  states 
in  Q with  non-vanishing  probabilities  ir^  and  tk  , then  there 
exists  some  finite  n for  which  p^  > 0. 

B (Steady  State  Condition). 


I for  all  j=l,...,B 

1=1  3 3 


(1.12) 


Then  we  will  have,  independently  of  i and  for  all  j: 


lim  P°H  = *1 
n-**>  3 J 


(1.13) 


In  other  words,  if  for  a given  set  of  state  probabilities  TT^  we  can 

construct  a random  walk  matrix  P which  satisfies  conditions  (A)  and  (B) 

above,  then  by  starting  a random  walk  in  any  state  x^f  the  probability  of 

winding  up  in  state  x^  after  a sufficiently  large  number  of  steps  will 

be  71^  . The  ergodic  condition  (A)  essentially  requires  £ to  be  such  that 

any  possible  state  x,  be  reachable  from  any  possible  state  x.  by  a finite 

J 

number  of  steps.  The  steady  state  condition  (B)  essentially  requires  £ to 
be  such  that  if  many  walkers  are  randomly  distributed  over  the  discrete 
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states  xi  according  to  the  probabilities  tt^,  this  random  distribution 
of  the  walkers  will  not  be  altered  if  each  walker  steps  once  according 
to  P.  From  a strictly  mathematical  point  of  view,  the  steady  state 
condition  (1.12)  requires  the  one-step  probability  matrix  P to  have  the 
state  probability  vector 


TTl 

TT2 


IT  = . 


(1.14) 


as  a left  eigenvector  with  eigenvalue  unity;  that  is,  tt*P  = it. 


Before  inquiring  into  the  possibility  of  finding,  for  a given  set  of 

state  probabilities  tt^,  a set  of  one-step  probabilities  p^  satisfying 

conditions  (A)  and  (B) , let  us  consider  what  we  should  do  once  we  have 

found  such  a set.  On  the  basis  of  conventional  Monte  Carlo  theory  we 

might  proceed  as  follows:  Starting  out  in  any  allowable  initial  state 

x*.  , take  a "sufficiently  large  number  n"  of  successive  steps  according 
io 

to  p^ , and  then  regard  the  state  arrived  at  at  the  n step  to  be  a 
state  randomly  chosen  inside  Q according  to  the  prescribed  state  proba- 
bilities tt^.  Repeating  this  process  N-l  more  times  would  give  us  the 
N random  points  xt  necessary  to  calculate  the  estimate  of  I in  (1.4). 

j 


An  obvious  question  here  is  precisely  what  constitutes  a "sufficiently 
large  value"  of  n;  in  other  words,  what  is  the  smallest  value  of  n that 


196 


Like  the  Central  Limit  Theorem  which  forms  the  basis  for  the  conventional 
Monte  Carlo  method  [cf.  Sec.  3-2],  the  Central  Limit  Theorem  for  Markov 
chains  does  not  tell  us  how  large  N must  be  in  order  for  the  Gaussian 
disti ibution  to  be  realized.  However,  in  view  of  the  asymptotic  nature 
of  (1.13),  one  has  the  intuitive  suspicion  that  N will  usually  have  to 
be  very  much  larger  for  the  Markov  chain  process  than  for  the  ordinary 
Monte  Carlo  process  in  order  for  Gaussian  results  to  be  obtained.  If  the 
convergence  of  (1.13)  were  "immediate",  that  is  if  for  n-1,  then 
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the  Markov  chain  average  in  (1.15)  would  evidently  be  equivalent 
to  the  conventional  Monte  Carlo  average  in  (1.4).  But  to  the  extent  that 
the  convergence  of  (1.13)  requires  large  values  of  n,  it  seems  reasonable 
to  suppose  that  the  Markov  chain  average  in  (1.15)  will  require  many  more 
terms  in  order  to  yield  results  comparable  to  what  might  be  expected  from 
a conventional  Monte  Carlo  average. 

Let  us  now  consider  how  one  might  go  about  constructing,  for  a given 
set  of  state  probabilities  tt^,  a set  of  one-step  probabilities  p_ 
satisfying  conditions  (A)  and  (B) . It  turns  out  that,  so  far  as  the 
ergodic  condition  (A)  is  concerned,  one  usually  can  only  hope  for  the 
best.  Certainly,  one  should  not  use  any  walking  scheme  which  is  obviously 
incapable  of  reaching  all  possible  states  ? in  ft.  However,  difficulties 
can  arise  if  ft  consists  of  ’’islands"  of  high  probability  states  in  a "sea" 
of  very  low  probability  states,  in  which  case  the  probability  of  walking 
from  one  island  to  another  will,  for  most  walking  schemes,  be  very  small. 
For  reasonable  values  of  N one  might  very  well  just  walk  around  on  only 
one  island,  and  if  the  function  f varied  significantly  from  island  to 
island  erroneous  results  would  obviously  be  obtained.  Unfortunately,  one 
is  rarely  able  to  definitely  rule  out  this  possibility  in  any  specific 
calculation. 

The  steady  state  condition  (B)  can  usually  be  satisfied  rather  easily. 
Notice  first  of  all  that  condition  (1.12)  depends  only  on  the  relative 
magnitudes  of  the  state  probabilities  tt ^ ; furthermore,  it  is  seen  from 
(1.3)  that  the  probabilities  themselves  depend  only  on  the  relative 
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magnitudes  of  the  density  function  P at  various  discrete  states.  Conse- 
quently we  can  always  write  tt^  simply  as  P^)  in  any  conveniently 
unnormalized  form,  and  thereby  avoid  the  (usually  impossible)  tasks  of 
(i)  evaluating  the  normalization  constant  for  the  density  function  P(x), 
and  (ii)  evaluating  the  normalizing  denominator  in  (1.3)  for 

One  way  of  setting  up  a one-step  matrix  P which  satisfies  the  steady 

* 

state  condition  (B)  is  as  follows:  Let  p^  be  any  set  of  one-step 

probabilities  which  satisfies,  in  addition  to  the  usual  conditions 
(I. 6)  and  (1.7),  the  condition 


(1.16) 


[It  is  usually  easy  to  set  up  such  a symmetric  one-step  probability 

matrix  £ in  any  specific  application.]  We  then  define  P in  terms  of 
* 

P and  7T  according  to 


(i*j): 


Pij'Vv  if  Vv1 

p*^  , if  ir^/ir^l 


(1.17) 


Pii  = pIi  + pij(1  - W 


It  is  not  difficult  to  show  that  this  set  p^  satisfies  (1.6),  (1.7) 
and  (1.12)  [see  p.  119  of  Ref.  1 for  a proof].  We  might  note  that  the 
satisfaction  of  the  steady  state  condition  (1.12)  essentially  results 
from  the  fact  that  (1.17)  is  so  constructed  that 
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I 


wipu  = Yji 


(1.18) 


From  this  it  follows,  using  (1.7),  that 
B B B 

jjVu  ' J/jPji  * "jj/ji  - "j 

which  is  just  (1.12).  We  should  remark  that  (1.17)  is  not  the  only 
scheme  for  constructing  P so  that  (1.18)  is  satisfied;  furthermore,  (1.18) 
itself  is  not  a necessary  condition  for  satisfying  the  steady  state 
condition  (1.12). 

To  actually  realize  a Markov  chain  according  to  (1.17),  suppose  the 

t^  state  of  the  Markov  chain  is  x^  and  we  wish  to  determine  the  (t+l)^ 

* 

state.  First,  we  pick  a tentative  state  x^  according  to  p^  , and  we 
calculate  the  probability  ratio  for  these  two  states.^  If 

7i\/TTf>l,  we  take  'x^  to  be  the  (t+l)^  state  of  the  Markov  chain.  If 
yVl,  we  pick  a random  number  r from  the  uniform  distribution  in  the 
unit  interval,  and  we  compare  it / n ^ with  r;  if  TT^/iT^^r  we  take  'x^  as 
the  (t+l)*"*1  state,  but  if  Tr_./7T^<r  we  take  *x^  as  the  (t+l)*"*1  state.  It 
is  important  to  note  that  p^  is  usually  not  zero,  so  one  should  have  no 
misgivings  about  winding  up  with  the  same  state  ^ for  both  the  t 
and  the  (t+l)C^  states  of  the  chain;  indeed  [see  p.  122  of  Ref.  11], 


t. 


For  the  microcanonical  ensemble  function  P(?)  mentioned  just  after  (1.1), 
the  ratio  is  simply  exp(-AU^/kT) , where  AU^=U(x.  )-U(x^)  is  the 

change  in  the  total  energy  of  the  system  in  going  from  state  x^  to  state  x \ 
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one  will  introduce  errors  if  one  forces  a genuine  change  of  state  at 
every  step  of  the  random  walk. 

For  further  details  on  applying  the  Markov  chain  Monte  Carlo  method  to 
specific  problems  in  statistical  mechanics,  the  reader  should  consult  the 
review  article  by  Wood  [Ref.  11]  and  references  contained  therein.  We 
shall  leave  the  subjec  at  this  point  with  the  following  general  comments. 

The  Markov  chain  Monte  Carlo  method  has  shown  itsei f capable  of  calculating 
certain  kinds  of  integrals  which  cannot  presently  be  calculated  either  by 
classical  methods  or  by  the  conventional  Monte  Carlo  method.  However,  the 
Markov  chain  method  requires  a considerable  amount  of  care  and  expertise 
on  the  part  of  the  user,  more  so  than  does  the  conventional  Monte  Carlo 
method.  A prime  source  of  uneasiness  when  using  the  Markov  chain  approach 
is  that  one  is  almost  never  sure  to  what  extent  the  ergodic  condition  (A) 
is  satisfied.  In  addition,  in  view  of  the  asymptotic  nature  of  the  result 
(1.13),  one  cannot  help  but  wonder  how  large  N will  have  to  be  in  order  for 
the  Central  Limit  Theorem  for  Markov  chains  to  truly  govern  the  accuracy 
of  the  approximation  (1.15).  For  these  reasons  (which  this  writer  freely 
admits  may  be  due  to  his  own  lack  of  experience  with  Markov  chain  calcula- 
tions) this  writer  is  of  the  opinion  that  the  Markov  chain  Monte  Carlo 
method  should  be  attempted  only  if  the  conventional  Monte  Carlo  method 
is  clearly  inapplicable.  Others  may  disagree  with  this  opinion; 
perhaps  the  one -dimensional  gas  of  impenetrable  rods  considered  in 
Sec.  2-10  might  afford  a vehicle  for  comparing  the  efficiency  and  reliability 
of  the  two  Monte  Carlo  methods. 
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